Gene tree parsimony (GTP) problems seek a species tree that infers the fewest evolutionary events necessary to explain discordance in a given collection of gene trees. Solving GTP problems allows evolutionary analyses of large gene families with complex evolutionary histories, for example, where gene duplication and loss events, or deep coalescence (incomplete lineage sorting), play a major role. Such problems are NP-hard, and therefore are addressed by heuristics, which lack any performance guarantee. We describe fast dynamic programming algorithms using bit-vector representation and Gray code enumeration that solve the GTP problems exactly, and in addition report the number of all optimal solutions. We show that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated studies, and compare our exact results with those of GTP heursistics.
展开▼