A Novel Heuristic for Data Distribution in Massively Parallel Phylogenetic Inference Using Site Repeats

机译：一种新的启发式方法，用于使用站点重复进行大规模并行系统发育推断中的数据分布

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Continuous advances in molecular sequencing technologies now allow for inferring evolutionary trees (phylogenies) on supercomputers that comprise hundreds to thousands of species at the whole-transcriptome or whole-genome level. The phylogenetic likelihood function (PLF) consumes 90-95% of total execution time in such analyses and is therefore typically parallelized. Recently, the site repeats (SR) technique for substantially accelerating the PLF has been introduced. It identifies repeating patterns in parts of the likelihood computation and omits the respective redundant calculations to save time and space. However, the SR technique induces a parallel load imbalance. In this paper, we introduce a novel randomized data distribution algorithm to improve load balance (RDDA) for SR-based likelihood calculations. The algorithm is available as open-source code, induces minimal run-time overhead, and yields up to 25% run time improvements on empirical datasets and up to 50% for a synthetic, worst-case scenario. This improvement is substantial as current evolutionary data analyses may require tens of millions of core hours on supercomputer systems.

机译：分子测序技术的不断发展现在允许在超级计算机上推断进化树（系统树），该树在整个转录组或整个基因组水平上包含成百上千种。系统发育似然函数（PLF）在此类分析中消耗了总执行时间的90-95％，因此通常被并行化。近来，已经引入了用于实质上加速PLF的位点重复（SR）技术。它在部分似然计算中识别出重复模式，并省略了相应的冗余计算，以节省时间和空间。但是，SR技术会引起并行的负载不平衡。在本文中，我们介绍了一种新颖的随机数据分配算法，以提高基于SR的似然性计算的负载平衡（RDDA）。该算法可作为开放源代码使用，可将运行时开销降至最低，并且在经验数据集上的运行时改进最多可提高25％，而在合成的最坏情况下则可提高50％。这种改进是实质性的，因为当前的进化数据分析可能需要超级计算机系统上数千万个核心小时。

著录项

来源
《IEEE International Conference on High Performance Computing and Communications;IEEE International Conference on Smart City;IEEE International Conference on Data Science and Systems》|2017年|81-88|共8页
会议地点
作者
Benoit Morel; Tomáš Flouri; Alexandros Stamatakis;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Vegetation; Partitioning algorithms; Phylogeny; Tools; Topology; Computational efficiency; Conferences;

机译：植被;分区算法;系统发育;工具;拓扑;计算效率;会议;

相似文献

外文文献
中文文献
专利

1. Data exploration in phylogenetic inference: scientific, heuristic, or neither [J] . Taran Grant, Arnold G. Kluge Cladistics: The international journal of the Willi Hennig Society . 2003,第5期

机译：系统发育推理中的数据探索：科学，启发式或两者都不
2. Parallel inference for massive distributed spatial data using low-rank models [J] . Katzfuss Matthias, Hammerling Dorit Statistics and computing . 2017,第2期

机译：使用低秩模型对大量分布式空间数据进行并行推理
3. Approximate inference for spatial functional data on massively parallel processors [J] . Lars Lau Rakêt, Bo Markussen Computational statistics & data analysis . 2014,第Null期

机译：大规模并行处理器上空间功能数据的近似推断
4. A Novel Heuristic for Data Distribution in Massively Parallel Phylogenetic Inference Using Site Repeats [C] . Benoit Morel, Tomá? Flouri, Alexandros Stamatakis IEEE International Conference on High Performance Computing and Communications . 2017

机译：使用现场重复的大规模平行系统发育推理的数据分布新的启发式
5. Development of a comprehensive massively parallel sequencing panel of single nucleotide polymorphism and short tandem repeat markers for human identification [D] . Warshauer, David H. 2015

机译：单核苷酸多态性和短串联重复标记的全面大规模平行测序专家组的开发用于人类鉴定
6. ParGenes: a tool for massively parallel model selection and phylogenetic tree inference on thousands of genes [O] . Benoit Morel, Alexey M Kozlov, Alexandros Stamatakis -1

机译：ParGenes：一种用于大规模并行模型选择和数千种基因的系统发育树推断的工具
7. Data Distribution for Phylogenetic Inference with Site Repeats via Judicious Hypergraph Partitioning [O] . Ivo Baar, Lukas Hübner, Peter Oettig, 2019

机译：通过明智的超图分区对网站的系统发育推论的数据分布

A Novel Heuristic for Data Distribution in Massively Parallel Phylogenetic Inference Using Site Repeats

摘要

著录项

相似文献

相关主题

期刊订阅