首页> 外文期刊>Computer physics communications >How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms
【24h】

How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms

机译:如何获得有效的GPU内核:使用FMM和FGT算法的说明

获取原文
获取原文并翻译 | 示例
       

摘要

Computing on graphics processors is maybe one of the most important developments in computational science to happen in decades. Not since the arrival of the Beowulf cluster, which combined open source software with commodity hardware to truly democratize high-performance computing, has the community been so electrified. Like then, the opportunity comes with challenges. The formulation of scientific algorithms to take advantage of the performance offered by the new architecture requires rethinking core methods. Here, we have tackled fast summation algorithms (fast multipole method and fast Gauss transform), and applied algorithmic redesign for attaining performance on gpus. The progression of performance improvements attained illustrates the exercise of formulating algorithms for the massively parallel architecture of the gpu. The end result has been gpu kernels that run at over 500 Gop/s on one nvidia tesla C1060 card, thereby reaching close to practical peak.
机译:图形处理器上的计算可能是数十年来计算科学领域最重要的发展之一。自将开源软件与商品硬件相结合以真正使高性能计算民主化的Beowulf集群出现以来,社区还没有如此电气化。像那时一样,机遇伴随着挑战。为了利用新架构提供的性能来制定科学算法,需要重新考虑核心方法。在这里,我们解决了快速求和算法(快速多极方法和快速高斯变换),并应用了算法重新设计以实现GPU上的性能。性能改进的进展说明了为GPU的大规模并行体系结构制定算法的过程。最终结果是gpu内核在一张nvidia tesla C1060卡上以超过500 Gop / s的速度运行,从而接近实际峰值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号