首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Orion: Scaling Genomic Sequence Matching with Fine-Grained Parallelization
【24h】

Orion: Scaling Genomic Sequence Matching with Fine-Grained Parallelization

机译:Orion:扩展基因组序列匹配与细粒度并行化

获取原文

摘要

Gene sequencing instruments are producing huge volumes of data, straining the capabilities of current database searching algorithms and hindering efforts of researchers analyzing large collections of data to obtain greater insights. In the space of parallel genomic sequence search, most of the popular software packages, like mpiBLAST, use the database segmentation approach, wherein the entire database is sharded and searched on different nodes. However this approach does not scale well with the increasing length of individual query sequences as well as the rapid growth in size of sequence databases. In this paper, we propose a fine-grained parallelism technique, called Orion, that divides the input query into an adaptive number of fragments and shards the database. Our technique achieves higher parallelism (and hence speedup) and load balancing than database sharding alone, while maintaining 100% accuracy. We show that it is 12.3X faster than mpiBLAST for solving a relevant comparative genomics problem.
机译:基因测序仪器正在产生大量数据,这限制了当前数据库搜索算法的功能,并阻碍了研究人员分析大量数据以获取更多见识的努力。在并行基因组序列搜索的空间中,大多数流行的软件包(例如mpiBLAST)都使用数据库分割方法,其中整个数据库在不同的节点上进行分片和搜索。但是,随着单个查询序列长度的增加以及序列数据库大小的快速增长,这种方法无法很好地扩展。在本文中,我们提出了一种称为Orion的细粒度并行技术,该技术将输入查询划分为自适应数量的片段,然后对数据库进行分片。与仅数据库分片相比,我们的技术可实现更高的并行度(并因此提高了速度)和负载平衡,同时保持了100%的准确性。我们显示它比mpiBLAST快了12.3倍,可以解决相关的比较基因组学问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号