首页> 外文会议>International conference on bioinformatics and computational biology >HSPp-BLAST: Highly Scalable Parallel PSI-BLAST for Very Large-scale Sequence Searches
【24h】

HSPp-BLAST: Highly Scalable Parallel PSI-BLAST for Very Large-scale Sequence Searches

机译:HSPp-BLAST:用于大规模序列搜索的高度可扩展并行PSI-BLAST

获取原文

摘要

Based on recent published articles, the growth of genomic data has overtaken and outpaced both performance improvements of storage technologies and processing power due to the revolutionary advancements of next generation sequencing technologies. By bringing down the costs and increasing throughput by many orders of magnitude with sequencing technologies, data is doubling every 9 months resulting in the exponential growth of genomic data in recent years. However, data analysis becomes increasingly difficult and can be prohibitive, as existing bioinformatics tools developed in the past decade focus mainly on desktops, workstations and small clusters that have limited capabilities. Improving the performance and scalability of such tools is critical to transforming ever-growing raw genomic data into biological knowledge containing invaluable information directly related to human health. This paper describes a new software application which includes optimization techniques improving the scalability of a most widely used bioinformatics tool "PSI-BLAST" on advanced parallel architectures, pushing the envelope of biological data analysis. We show that our improvements allow near-linear scaling to tens of thousands of processing cores, up to the maximum non-capability size on current petaflop supercomputers. This new tool increases by 5 orders of magnitude the amount of genomics data that can be processed per hour.
机译:根据最近发表的文章,由于下一代测序技术的革命性进步,基因组数据的增长已经超过并超过了存储技术的性能提升和处理能力。通过使用测序技术降低成本并将吞吐量提高多个数量级,数据每9个月翻一番,导致近年来基因组数据呈指数级增长。但是,由于过去十年开发的现有生物信息学工具主要侧重于功能有限的台式机,工作站和小型集群,因此数据分析变得越来越困难,并且可能变得令人望而却步。改善此类工具的性能和可伸缩性对于将不断增长的原始基因组数据转换为包含与人类健康直接相关的宝贵信息的生物学知识至关重要。本文介绍了一种新的软件应用程序,其中包括优化技术,这些技术提高了最先进的并行体系结构上最广泛使用的生物信息学工具“ PSI-BLAST”的可扩展性,从而推动了生物数据分析的发展。我们表明,我们的改进允许近乎线性地扩展到成千上万个处理内核,最大扩展到当前petaflop超级计算机上的最大非能力大小。这种新工具每小时可处理的基因组数据量增加了5个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号