首页> 外文期刊>Cluster Computing >Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA
【24h】

Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA

机译:使用CUDA在图形处理单元上可扩展和高度并行地实现Smith-Waterman

获取原文
获取原文并翻译 | 示例

摘要

Program development environments have enabled graphics processing units (GPUs) to become an attractive high performance computing platform for the scientific community. A commonly posed problem in computational biology is protein database searching for functional similarities. The most accurate algorithm for sequence alignments is Smith-Waterman (SW). However, due to its computational complexity and rapidly increasing database sizes, the process becomes more and more time consuming making cluster based systems more desirable. Therefore, scalable and highly parallel methods are necessary to make SW a viable solution for life science researchers. In this paper we evaluate how SW fits onto the target GPU architecture by exploring ways to map the program architecture on the processor architecture. We develop new techniques to reduce the memory footprint of the application while exploiting the memory hierarchy of the GPU. With this implementation, GSW, we overcome the on chip memory size constraint, achieving 23× speedup compared to a serial implementation. Results show that as the query length increases our speedup almost stays stable indicating the solid scalability of our approach. Additionally this is a first of a kind implementation which purely runs on the GPU instead of a CPU-GPU integrated environment, making our design suitable for porting onto a cluster of GPUs.
机译:程序开发环境使图形处理单元(GPU)成为科学界有吸引力的高性能计算平台。在计算生物学中普遍提出的问题是蛋白质数据库搜索功能相似性。用于序列比对的最准确算法是Smith-Waterman(SW)。然而,由于其计算复杂性和数据库大小的迅速增加,该过程变得越来越耗时,使得基于集群的系统更加可取。因此,使SW成为生命科学研究人员可行的解决方案,必须采用可扩展且高度并行的方法。在本文中,我们通过探索将程序架构映射到处理器架构上的方法,来评估SW如何适合目标GPU架构。我们开发新技术来减少应用程序的内存占用,同时利用GPU的内存层次结构。通过GSW的这种实现,我们克服了片上存储器大小的限制,与串行实现相比,实现了23倍的加速。结果表明,随着查询长度的增加,我们的加速几乎保持稳定,这表明我们的方法具有可靠的可扩展性。此外,这是第一个完全在GPU上而不是在CPU-GPU集成环境中运行的实现,这使我们的设计适合于移植到GPU集群上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号