Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. The starting point for any comparison of mammalian genomes is the computation of exact matches between their DNA sequences. It is well-known how to do this time-efficiently with full-text index structures like the suffix tree or the suffix array. The space consumption of these indexes is often the limiting factor in whole-genome comparative projects and other large-scale applications. Fortunately, in the last years research on compressed full-text index structures has flourished, and algorithms for the computation of common k-mers and maximal unique matches on compressed indexes have been proposed. However, for the important class of maximal exact matches such an algorithm has not been provided. In this paper, we present the first algorithm for the computation of maximal exact matches on a compressed full-text index.
展开▼