首页> 美国政府科技报告 >Analysis and Implementation of Particle-to-Particle (P2P) Graphics Processor Unit (GPU) Kernel for Black-Box Adaptive Fast Multipole Method.

【24h】

Analysis and Implementation of Particle-to-Particle (P2P) Graphics Processor Unit (GPU) Kernel for Black-Box Adaptive Fast Multipole Method.

机译：黑盒自适应快速多极子粒子到粒子图形处理器单元（GpU）核的分析与实现。

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The Black-Box Adaptive Fast Multipole Method (bbAFMM) has been generating some interest within the high-performance computing community as a tractable solution to the well-known n-body problem. The bbAFMM approximates the n-body solution using a series of independent functions or kernels that are attractive to high-performance code development using one or more graphics processor unit (GPU) devices. This work follows the analysis and implementation of the direct interaction called particle-to-particle kernel for a shared-memory single GPU device using the Compute Unified Device Architecture, revealing a performance boost of greater than 500 times over the corresponding serial central processing unit implementation. The objective of this work is to both document the implementation of the GPU kernel and provide a better understanding of the observed performance through an algorithmic analysis that focuses on arithmetic intensity, GPU memory bandwidth, GPU peak performance, and the defined Peripheral Component Interconnect Express bandwidth.

著录项

作者
Haney, R. H.; Darve, E.; Ansari, M. P.; Pataki, R.; AminFar, A.; Shires, D.;
展开▼
作者单位

展开▼
年度 2015
页码 1-20
总页数 20
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Algorithms; Central processing units; Coding; High performance computing; Memory devices; N body problem; P2p(particle-to-particle); Gpu(graphics processor unit); Bbafmm(black-box adaptive fast multipole method); Fast multipole method; Cuda(compute unified device architecture); Pcie(peripheral component interconnect express);

机译：算法;中央处理单元;编码;高性能计算;存储器件; N体问题; p2p（粒子到粒子）; Gpu（图形处理器单元）; Bbafmm（黑盒自适应快速多极法）;快速多极法; Cuda（计算统一设备架构）; pcie（外围组件互连快速）;

相似文献

外文文献
中文文献
专利

1. Graphics processing unit (GPU) accelerated fast multipole BEM with level-skip M2L for 3D elasticity problems [J] . Yingjun Wang, Qifu Wang, Xiaowei Deng, Advances in Engineering Software . 2015,第apra期

机译：图形处理器（GPU）加速了具有级跃M2L的快速多极BEM，可解决3D弹性问题
2. Parallel Fast Transform-Based Preconditioners for Large-Scale Power Grid Analysis on Graphics Processing Units (GPUs) [J] . Konstantis Daloukas, Nestor Evmorfopoulos, Panagiota Tsompanopoulou, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2016,第10期

机译：基于并行快速变换的预处理器，用于图形处理单元（GPU）上的大规模电网分析
3. Implementation of the replica-exchange Wang-Landau sampling on Graphics Processing Units (GPUs) [J] . Boer A. Computer physics communications . 2019,第期

机译：在图形处理单元（GPU）上的副本交换王兰采样的实施
4. Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs) [C] . Moore Nicholas, Leeser Miriam, King Laurie Smith IEEE International Parallel Distributed Processing Symposium . 2013

机译：内核专业化，可提高图形处理单元（GPU）的适应性和性能
5. High performance multiscale image processing framework on multi-GPUs (graphics processing units) with applications to unbiased diffeomorphic atlas construction. [D] . Ha, Linh Khanh. 2011

机译：多GPU（图形处理单元）上的高性能多尺度图像处理框架，可应用于无偏微晶图集构造。
6. Real time implementation of anti-scatter grid artifact elimination method for high resolution x-ray imaging CMOS detectors using Graphics Processing Units (GPUs) [O] . R. Rana, S.V. Setlur Nagesh, D.R. Bednarek, -1

机译：使用图形处理单元（GPU）的高分辨率X射线成像CMOS检测器的防散射网格伪影消除方法的实时实现
7. GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units [O] . Zandevakili, Pooya, Hu, Ming, Qin, Zhaohui 2012

机译：GPUmotif：使用图形处理单元的超快速节能型母题分析程序

Analysis and Implementation of Particle-to-Particle (P2P) Graphics Processor Unit (GPU) Kernel for Black-Box Adaptive Fast Multipole Method.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅