DIMACS TR: 2001-02

Multiple sequence alignment using the quasi-concave function optimization based on the DIALIGN combinatorial structures



Authors: Leonid Shvartser, Casimir Kulikowski, Ilya Muchnik

ABSTRACT

Multiple sequence alignment is usually considered as an optimization problem, which has a statistical and a structural component. It is known that in the problem of protein sequence alignment a processed sample is too small and not representative in the statistical sense though this information can be sufficient if an appropriate structural model is used. In order to utilize this information a new structural description of the pairwise alignment results union has been developed. It is shown that if the structure is restored then Multiple Sequence Alignment is achieved. Introduced structure represents the set of local maximums of quasi-concave set function on a lower semi lattice, which in turn is a union of the set-theoretical intervals. This union is a set of the consistent subsets of diagonals, introduced by B. Morgenstern, A. Dress, and T. Werner (1996). Algorithm for local maximums search on proposed structure has been developed. It consists of an alternation of the Forward and Backward passes. The Backward pass in this algorithm is a rigorous while the Forward pass is based on heuristics. Multiple alignment of 5 protein sequences are used as an illustration of the proposed algorithm.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2001/2001-02.ps.gz
DIMACS Home Page