首页> 外文会议>International Conference on Bioinformatics and Computational Biology >HSPp-BLAST: Highly Scalable Parallel PSI-BLAST for Very Large-scale Sequence Searches
【24h】

HSPp-BLAST: Highly Scalable Parallel PSI-BLAST for Very Large-scale Sequence Searches

机译:HSPP-BLAST:非常大规模序列搜索的高度可扩展并行PSI-BLAST

获取原文

摘要

Based on recent published articles, the growth of genomic data has overtaken and outpaced both performance improvements of storage technologies and processing power due to the revolutionary advancements of next generation sequencing technologies. By bringing down the costs and increasing throughput by many orders of magnitude with sequencing technologies, data is doubling every 9 months resulting in the exponential growth of genomic data in recent years. However, data analysis becomes increasingly difficult and can be prohibitive, as existing bioinformatics tools developed in the past decade focus mainly on desktops, workstations and small clusters that have limited capabilities. Improving the performance and scalability of such tools is critical to transforming ever-growing raw genomic data into biological knowledge containing invaluable information directly related to human health. This paper describes a new software application which includes optimization techniques improving the scalability of a most widely used bioinformatics tool "PSI-BLAST" on advanced parallel architectures, pushing the envelope of biological data analysis. We show that our improvements allow near-linear scaling to tens of thousands of processing cores, up to the maximum non-capability size on current petaflop supercomputers. This new tool increases by 5 orders of magnitude the amount of genomics data that can be processed per hour.
机译:基于近期公布的文章,由于下一代测序技术的革命性进展,基因组数据的增长已经超越并超越了存储技术和处理能力的性能改进。通过利用测序技术降低成本并提高吞吐量,通过测序技术,数据近年来每9个月加倍,导致近年来基因组数据的指数增长。然而,数据分析变得越来越困难,可以让人望而却步,因为现有的生物信息学工具在过去十年主要集中于台式机,工作站和有能力有限的小集群发展。提高这些工具的性能和可扩展性对于将永远生长的原始基因组数据转化为含有与人类健康直接相关的宝贵信息的生物学知识。本文介绍了一种新的软件应用程序,包括优化技术,提高高级并行架构上使用最广泛使用的生物信息刀具“PSI-BLAST”的可扩展性,推动生物数据分析的包络。我们表明,我们的改进允许近千分缩放到数万个加工核心,直至最大的PETAFLOP超级计算机上的最大不可能大小。这个新工具的数量数量增加了5个数量级,可以每小时处理的基因组数据量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号