Improving Bioinformatics Analysis of Large Sequence Datasets Parallelizing Tools for Population Genomics

机译：改善人口基因组学大序列数据集并行化工具的生物信息学分析

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Next-generation sequencing (NGS) technologies initiated a revolution in genomics, producing massive amounts of biological data and the consequent need for adapting current computing infrastructures. Multiple alignment of genomes, analysis of variants or phylogenetic tree construction, with quadratic polynomial complexity in the best case are tools that can take days or weeks to complete in conventional computers. Most of these analysis, involving several tools integrated in workflows, present the possibility of dividing the computational load in independent tasks allowing parallel execution. Determining adequate load balancing, data partitioning, granularity and I/O tuning are key factors for achieving suitable speedups. In this paper we present a coarse-grain parallelization of GH caller (Genotype/Haplotype caller), a tool used in population genomics workflows that performs a probabilistic identification process to account for the frequency of variants present between population individuals. It implements a master-worker model, using the standard Message Passing Interface (MPI), and concurrently and iteratively distributes the data among the available worker processes by mapping subsets of data and leaving the orchestration to the master process. Our results show a performance gain factor of 260x using 64 processes and additional optimizations with regard to the initial non-parallelized version.

机译：下一代测序（NGS）技术引发了基因组学的一场革命，产生了大量的生物数据，因此需要适应当前的计算基础架构。在最佳情况下，基因组的多重比对，变异分析或系统树的构建以及二次多项式复杂性是在常规计算机中可能需要几天或几周才能完成的工具。这些分析中的大多数都涉及工作流中集成的几种工具，它们提出了将计算负荷划分为独立任务的可能性，从而允许并行执行。确定适当的负载平衡，数据分区，粒度和I / O调整是实现适当加速的关键因素。在本文中，我们介绍了GH调用者（Genotype / Haplotype调用者）的粗粒度并行化，这是一种在人口基因组学工作流程中使用的工具，该工具执行概率识别过程，以解决种群个体之间存在的变异频率。它使用标准的消息传递接口（MPI）实现主工作模型，并通过映射数据的子集并将编排留给主过程，在可用的工作进程中同时并迭代地分布数据。我们的结果表明，使用64个过程以及相对于初始非并行版本的其他优化，性能提高了260倍。

著录项

来源
《Euro-par 2016: parallel processing workshops》|2016年|457-467|共11页
会议地点 Grenoble(FR)
作者
Javier Navarro; Gonzalo Vera; Sebastian Ramos-Onsins; Porfidio Hernandez;
展开▼
作者单位

Universitat Autonoma de Barcelona, Bellaterra, Spain;

Center for Research in Agricultural Genomics, Barcelona, Spain;

Center for Research in Agricultural Genomics, Barcelona, Spain;

Universitat Autonoma de Barcelona, Bellaterra, Spain;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Bioinformatics; NGS; Population genomics; Variant analysis; Parallelization; Scalability;

机译：生物信息学NGS；人口基因组学；变异分析；并行化；可扩展性;

相似文献

外文文献
中文文献
专利

1. Bioinformatic Tools for the Analysis of Sugarcane Genomic Sequences [J] . Andrés Avalos, Luis Molina Sugar Journal: Covering the World's Sugar Industry . 2014,第11期

机译：用于甘蔗基因组序列分析的生物信息学工具
2. Variability in HIV-1 partial genomic sequences in Costa Rican patients: analysis with different bioinformatics tools [J] . Taylor-Castillo Lizeth, León-Bratti María Paz, Solano-Chinchilla Antonio, Revista Panamericana de Salud Pública . 2010,第1期

机译：哥斯达黎加患者HIV-1部分基因组序列的变异性：使用不同生物信息学工具的分析
3. PhyloToAST: Bioinformatics tools for species-level analysis and visualization of complex microbial datasets [J] . Shareef M. Dabdoub, Megan L. Fellows, Akshay D. Paropkari, Scientific reports. . 2016,第1期

机译：Phylotoast：复杂微生物数据集的物种级别分析和可视化的生物信息学工具
4. Improving Bioinformatics Analysis of Large Sequence Datasets Parallelizing Tools for Population Genomics [C] . Javier Navarro, Gonzalo Vera, Sebastian Ramos-Onsins, International Conference on Parallel and Distributed Computing . 2017

机译：改善大序列数据集的生物信息学分析，对群体基因组学的平行化工具
5. Bioinformatic analysis of viral genomic sequences and concepts of genome-specific rational vaccine design. [D] . Chatterjee, Sharmistha P. 2013

机译：病毒基因组序列的生物信息学分析和基因组特异性合理疫苗设计的概念。
6. SeqBuster a bioinformatic tool for the processing and analysis of small RNAs datasets reveals ubiquitous miRNA modifications in human embryonic cells [O] . Lorena Pantano, Xavier Estivill, Eulàlia Martí 2010

机译：SeqBuster是一种用于处理和分析小型RNA数据集的生物信息学工具可揭示人类胚胎细胞中普遍存在的miRNA修饰
7. Bioinformatics protocols for analysis of functional genomics data applied to neuropathy microarray datasets [O] . Diboun I. 2010

机译：生物信息学协议，用于分析应用于神经病变微阵列数据集的功能基因组学数据

Improving Bioinformatics Analysis of Large Sequence Datasets Parallelizing Tools for Population Genomics

摘要

著录项

相似文献

相关主题

期刊订阅