Assembling High-resolution Restriction Maps Using Multiple Complete Digestions

Will Gillett and Maynard V.0lson*

Computer Science Department, Washington Univ. St. Louis, MO 63108
* Department of Molecular Biotechnology Univ. of Washington, Seattle, WA 98195

Multiple-Complete-Digestion (MCD) mapping is an extension of the techniques used by Olson, et al. (1986). in mapping the S. cerevisiae genome. In this original approach, greedy algorithms were used to bind together fragments of similar length in different clones. Using techniques such as exhaustively accounting for all fragments and allowing only unbranching maps were major factors which allowed to project to construct very accurate highly-structured restriction-fragment maps (Riles, et al., 1993).

MCD mapping extends the basic approach by using multiple complete digestions (using a suite of say 2 or 3 restriction enzymes) of the clone set. Maps in the different restriction enzyme domains are simultaneously constructed, each map containing the same clones and each clone residing in "the same position" in each map. In the extended paradigm, maps encode not only the relative positioning of the fragments but also the relative positioning of the clone ends. Since the clone ends exist independently of any restriction enzyme being mapped and there is some total ordering of these clone ends in the underlying genome, the presence of the same clone ends in the different mapping domains can be used to enforce consistent positioning of the clones in "the same position" in all mapping domains. This significantly increases the accuracy by which clones (and fragments) can be placed. The uncertainty (chance of placing a clone in an incorrect position) decreases exponentially as a function of the number of restriction enzymes being used. The real power of MCD mapping lies in its ability to detect errors in one or more of the mapping domains by observing inconsistent clone placement across the domains.

The maps produced by MCD mapping can be used as a preprocessing step for Double Digest Mapping. If MCD mapping is done in three mapping domains, two single digests and one double digest, the computational complexity of double digest mapping (applied to this triplet of maps) can be made, in practice, linear in the length of the maps. This is because the clone ends present in each map synchronize across the maps, delimiting small regions of fragments to which "standard" double digest mapping algorithms can be (effectively and efficiently) applied. Being able to apply the technique to small regions of fragments also reduces other problems previously inherent to double digest mapping, the absence of small fragments and numerous multiple solutions.

The development of these concepts and the implementation of new techniques to address them have introduced a number of interesting theoretical and practical problems. For instance, our work addresses the major problem inherent in applying greedy algorithms to fragment-length data, i.e., fragment confusion, which take the observed form of ambiguity and fragment collapsing. Fragment collapsing occurs when two different genomic fragments of apparently the same length in two overlapping clones (but not in the overlapping region) "collapse" into one fragment in the overlapping region of the map. Our techniques can sometimes detect these collapsed fragments and attempt to "split" them. We have recently found a linear algorithm for "splitting a fragment", a problem which we had previously believed was exponential (in the depth of the map). Solving this last major impediment, along with a number of other uncertainty-reducing techniques, makes MCD mapping an extremely powerful and accurate mapping paradigm. It can be used effectively in a number of different mapping situations, from mapping cosmids within a YAC to constructing sequence-ready maps from phagemids.

Program

DIMACS Homepage

Contacting the Center
Document last modified on March 28, 2000.