Background:Forward-time population genetic simulations play a central role in deriving and testing evolutionaryhypotheses. Such simulations may be data-intensive, depending on the settings to the various param-eters controlling them. In particular, for certain settings, the data footprint may quickly exceed thememory of a single compute node.Results:We develop a novel and general method for addressing the memory issue inherent in forward-timesimulations by compressing and decompressing, in real-time, active and ancestral genotypes, whilecarefully accounting for the time overhead. We propose a general graph data structure for compressingthe genotype space explored during a simulation run, along with efficient algorithms for constructingand updating compressed genotypes which support both mutation and recombination. We tested theperformance of our method in very large-scale simulations. Results show that our method not onlyscales well, but that it also overcomes memory issues that would cripple existing tools.Conclusions:As evolutionary analyses are being increasingly performed on genomes, pathways, and networks,particularly in the era of systems biology, scaling population genetic simulators to handle large-scalesimulations is crucial. We believe our method offers a significant step in that direction. Further, thetechniques we provide are generic and can be integrated with existing population genetic simulatorsto boost their performance in terms of memory usage.
展开▼