DIMACS TR: 2007-21

Algorithms and Heuristics for Constrained Generalized Tree Alignment Problem



Author: Dr. Srikrishnan Divakaran

ABSTRACT

In generalized tree alignment problem, we are given a set S of k biologically related sequences and we are interested in a minimum cost evolutionary tree for S. In many instances of this problem partial topology of the phylogenetic tree for S is known. In such instances, we would like to make use of this knowledge to restrict the tree topologies that we consider and construct a biologically relevant minimum cost evolutionary tree. So, in this paper we propose the following natural generalization of the generalized tree alignment problem, a problem known to be MAX-SNP Hard, stated as follows:
Constrained Generalized Tree Alignment Problem: Given a set S of k related sequences and a phylogenetic forest comprising of node-disjoint phylogenetic trees that specify the topological constraints that an evolutionary tree of S needs to satisfy, construct a minimum cost evolutionary tree for S.
In this paper, we present constant approximation algorithms for the constrained generalized tree alignment problem. For the generalized tree alignment problem, a special case of this problem, our algorithms provide a guaranteed error bound of 2-2/k and do not exclude any tree topology a priori.

Paper Available at: ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/TechReports/2007/2007-21.pdf
DIMACS Home Page