Orion: Scaling Genomic Sequence Matching with Fine-Grained Parallelization

机译：Orion：扩展基因组序列匹配与细粒度并行化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Gene sequencing instruments are producing huge volumes of data, straining the capabilities of current database searching algorithms and hindering efforts of researchers analyzing large collections of data to obtain greater insights. In the space of parallel genomic sequence search, most of the popular software packages, like mpiBLAST, use the database segmentation approach, wherein the entire database is sharded and searched on different nodes. However this approach does not scale well with the increasing length of individual query sequences as well as the rapid growth in size of sequence databases. In this paper, we propose a fine-grained parallelism technique, called Orion, that divides the input query into an adaptive number of fragments and shards the database. Our technique achieves higher parallelism (and hence speedup) and load balancing than database sharding alone, while maintaining 100% accuracy. We show that it is 12.3X faster than mpiBLAST for solving a relevant comparative genomics problem.

机译：基因测序仪器正在产生大量数据，这限制了当前数据库搜索算法的功能，并阻碍了研究人员分析大量数据以获取更多见识的努力。在并行基因组序列搜索的空间中，大多数流行的软件包（例如mpiBLAST）都使用数据库分割方法，其中整个数据库在不同的节点上进行分片和搜索。但是，随着单个查询序列长度的增加以及序列数据库大小的快速增长，这种方法无法很好地扩展。在本文中，我们提出了一种称为Orion的细粒度并行技术，该技术将输入查询划分为自适应数量的片段，然后对数据库进行分片。与仅数据库分片相比，我们的技术可实现更高的并行度（并因此提高了速度）和负载平衡，同时保持了100％的准确性。我们显示它比mpiBLAST快了12.3倍，可以解决相关的比较基因组学问题。

著录项

来源
《International Conference for High Performance Computing, Networking, Storage and Analysis》|2014年|449-460|共12页
会议地点
作者
Mahadik Kanak; Chaterji Somali; Bowen Zhou; Kulkarni Milind; Bagchi Saurabh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
biology computing; database management systems; genomics; query processing; string matching; Orion; database segmentation; fine-grained parallelization; genomic sequence matching; query sequence; Bioinformatics; DNA; Databases; Genomics; Organisms; Parallel processing;

机译：生物学计算;数据库管理系统;基因组学;查询处理;字符串匹配; Orion;数据库分割;细粒度并行化;基因组序列匹配;查询序列;生物信息学; DNA;数据库;基因组学;有机体;并行处理;

相似文献

外文文献
中文文献
专利

1. MODYLAS: A Highly Parallelized General-Purpose Molecular Dynamics Simulation Program for Large-Scale Systems with Long-Range Forces Calculated by Fast Multipole Method (FMM) and Highly Scalable Fine-Grained New Parallel Processing Algorithms [J] . Yoshimichi Andoh, Noriyuki Yoshii, Kazushi Fujimoto Journal of chemical theory and computation: JCTC . 2013,第7期

机译：MODYLAS：具有并行力的大型多用途通用分子动力学仿真程序，该程序由快速多极方法（FMM）和高度可扩展的细粒度新并行处理算法计算而得
2. cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU [J] . Jing Zhang, Hao Wang, Wu-chun Feng IEEE/ACM transactions on computational biology and bioinformatics . 2017,第4期

机译：cuBLASTP：CPU + GPU上蛋白质序列搜索的细粒度并行化
3. Scalable fine-grained parallelization of plane-wave-based ab initio molecular dynamics for large supercomputers [J] . Vadali RV, Shi Y, Kumar S, Journal of Computational Chemistry: Organic, Inorganic, Physical, Biological . 2004,第16期

机译：大型超级计算机基于平面波的从头算分子动力学的可扩展细粒度并行化
4. Using Private Matching for Securely Querying Genomic Sequences [C] . Justin Zhan, Luis Cabrera, Gasim Osman, International Conference on Privacy, Security, Risk and Trust . 2011

机译：使用私有匹配来安全地查询基因组序列
5. High Performance and Scalable Matching and Assembly of Biological Sequences. [D] . Abu Doleh, Anas. 2016

机译：高性能，可扩展的生物序列匹配和组装。
6. Churchill: an ultra-fast deterministic highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics [O] . Benjamin J Kelly, James R Fitch, Yangqiu Hu, 2015

机译：丘吉尔：一种超快速确定性高度可扩展且平衡的并行化策略用于发现临床和人群规模基因组学中的人类遗传变异
7. Scalable fine-grained parallelization of plane-wave-based ab initio molecular dynamics for large supercomputers [O] . Ramkumar V. Vadali, Yan Shi, Sameer Kumar, 2004

机译：基于平面波的ab initio分子动力学的可扩展细粒度并行化，适用于大型超级计算机

Orion: Scaling Genomic Sequence Matching with Fine-Grained Parallelization

摘要

著录项

相似文献

相关主题

期刊订阅