DIMACS TR: 2005-39

An evaluation of the edit-distance-with-moves similarity metric for comparing genetic sequences



Authors: Shiri Azenkot, Tzu-Yi Chen and Graham Cormode

ABSTRACT

We describe the first known implementation of an approximation algorithm for the string edit distance with moves similarity metric. This is the first algorithm to consider nontrivial alignment and run in substantially sub-quadratic time [2]. Extensive experimentation demonstrates that the algorithm produces a good approximation for the edit distance with moves metric, especially on strings of length 500B to 10KB. We also found that the algorithm has high potential for use in computational biology. When comparing texts of genetic sequences, our algorithm outperforms the q-grams heuristic in predicting results of the Smith-Waterman algorithm. Finally, we propose additional application areas for our implementation.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2005/2005-39.ps.gz
DIMACS Home Page