How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms

Cruz F.A.; Layton S.K.; Barba L.A.

首页> 外文期刊>Computer physics communications >How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms

【24h】

How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms

机译：如何获得有效的GPU内核：使用FMM和FGT算法的说明

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Computing on graphics processors is maybe one of the most important developments in computational science to happen in decades. Not since the arrival of the Beowulf cluster, which combined open source software with commodity hardware to truly democratize high-performance computing, has the community been so electrified. Like then, the opportunity comes with challenges. The formulation of scientific algorithms to take advantage of the performance offered by the new architecture requires rethinking core methods. Here, we have tackled fast summation algorithms (fast multipole method and fast Gauss transform), and applied algorithmic redesign for attaining performance on gpus. The progression of performance improvements attained illustrates the exercise of formulating algorithms for the massively parallel architecture of the gpu. The end result has been gpu kernels that run at over 500 Gop/s on one nvidia tesla C1060 card, thereby reaching close to practical peak.

机译：图形处理器上的计算可能是数十年来计算科学领域最重要的发展之一。自将开源软件与商品硬件相结合以真正使高性能计算民主化的Beowulf集群出现以来，社区还没有如此电气化。像那时一样，机遇伴随着挑战。为了利用新架构提供的性能来制定科学算法，需要重新考虑核心方法。在这里，我们解决了快速求和算法（快速多极方法和快速高斯变换），并应用了算法重新设计以实现GPU上的性能。性能改进的进展说明了为GPU的大规模并行体系结构制定算法的过程。最终结果是gpu内核在一张nvidia tesla C1060卡上以超过500 Gop / s的速度运行，从而接近实际峰值。

著录项

来源
《Computer physics communications》 |2011年第10期|共15页
作者
Cruz F.A.; Layton S.K.; Barba L.A.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机的应用;
关键词
Fast Gauss transform; Fast multipole method; Fast summation methods; Heterogeneous computing;

机译：快速高斯变换;快速多极点方法;快速求和方法;异构计算;
入库时间 2022-08-18 09:39:18

相似文献

外文文献
中文文献
专利

1. How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms [J] . Cruz F.A., Layton S.K., Barba L.A. Computer physics communications . 2011,第10期

机译：如何获得有效的GPU内核：使用FMM和FGT算法的说明
2. A Weighted Spatial-Spectral Kernel RX Algorithm and Efficient Implementation on GPUs [J] . Chunhui Zhao, Jiawei Li, Meiling Meng, Sensors . 2017,第3期

机译：加权空间谱核RX算法及在GPU上的高效实现
3. Efficient Performance Estimation and Work-Group Size Pruning for OpenCL Kernels on GPUs [J] . IEEE Transactions on Parallel and Distributed Systems . 2020,第5期

机译：GPU上的OpenCL内核的高效性能估计和工作组大小修剪
4. Computing Acceleration of FMM Algorithm on the Basis of FPGA and GPU [C] . Yahui Chai, Wenfeng Shen, Weimin Xu, International Conference on Advanced Engineering Materials and Technology . 2011

机译：基于FPGA和GPU计算FMM算法的加速度
5. Efficient Viewshed Computation Algorithms on GPUs and CPUs [D] . Qarah, Faisal F. 2020

机译：GPU和CPU上有效的viewShed计算算法
6. Protein alignment algorithms with an efficient backtracking routine on multiple GPUs [O] . Jacek Blazewicz, Wojciech Frohmberg, Michal Kierzynka, 2011

机译：具有多个GPU上高效回溯例程的蛋白质比对算法
7. How to obtain efficient GPU kernels: an illustration using FMM & FGT algorithms [O] . Cruz, Felipe A., Layton, Simon K., Barba, Lorena A. 2011

机译：如何获得高效的GpU内核：使用Fmm和FGT的插图算法

How to obtain efficient GPU kernels: An illustration using FMM & FGT algorithms

摘要

著录项

相似文献

相关主题

期刊订阅