Distances: my personal experience

Just a few suggestions according to my own experience. For a more formal and detailed treatment go to Wikipedia or a maths manual.BR For a very rich directory on string similarity go to [http://www.dcs.shef.ac.uk/~sam/stringmetrics.html this link]

Quantitative Data Arrays

For quantitative data (e.g. transcription signals from microarray experiments) you can use

Binary Arrays

Hamming distance is not good, in my opinion, when the strings compared have a very unequal 1/0 content, and the meaning of 1s and 0s is related to set-membership (as in the examples above).BR E.g., consider these strings, and the choice made by Hamming and Jaccard:BR

Mapping the first problem to interactions, let's say BR A1 interacts with B, CBR A2 interacts with B, DBR A3 interacts with B, C, E, F, GBR is A1 neighborhood more similar to A2 (Hamming's choice) or A3 (Jaccard's choice)BR BR In addition, neither of these two distances is suited when some array positions are noisy, whereas others are information-rich, as in the following fictional examples:BR 110 010 001BR 110 100 000BR 110 000 110BR 000 101 100BR 001 101 011BR 000 111 000BR clearly, 1st and 2nd position are always co-conserved, as well as 4th and 6th; the co-conservation of other residues is irregular; therefore, we may think that the co-conserved positions hold more information than the other ones.

DanieleMerico/HowtoDirectory/Distances (last edited 2007-11-16 00:19:58 by DanieleMerico)

MoinMoin Appliance - Powered by TurnKey Linux