Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

Li S.; Livshitz B.; Lomakin V.

首页> 外文期刊>Journal of Computational Physics >Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

【24h】

Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

机译：在图形处理单元（GPU）上快速评估亥姆霍兹电位

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a parallel algorithm implemented on graphics processing units (GPUs) for rapidly evaluating spatial convolutions between the Helmholtz potential and a large-scale source distribution. The algorithm implements a non-uniform grid interpolation method (NGIM), which uses amplitude and phase compensation and spatial interpolation from a sparse grid to compute the field outside a source domain. NGIM reduces the computational time cost of the direct field evaluation at N observers due to N co-located sources from O(N~2) to O(N) in the static and low-frequency regimes, to O(NlogN) in the high-frequency regime, and between these costs in the mixed-frequency regime. Memory requirements scale as O(N) in all frequency regimes. Several important differences between CPU and GPU implementations of the NGIM are required to result in optimal performance on respective platforms. In particular, in the CPU implementations all operations, where possible, are pre-computed and stored in memory in a preprocessing stage. This reduces the computational time but significantly increases the memory consumption. In the GPU implementations, where handling memory often is a critical bottle neck, several special memory handling techniques are used to accelerate the computations. A significant latency of the GPU global memory access is hidden by implementing coalesced reading, which requires arranging many array elements in contiguous parts of memory. Contrary to the CPU version, most of the steps in the GPU implementations are executed on-fly and only necessary arrays are kept in memory. This results in significantly reduced memory consumption, increased problem size N that can be handled, and reduced computational time on GPUs. The obtained GPU-CPU speed-up ratios are from 150 to 400 depending on the required accuracy and problem size. The presented method and its CPU and GPU implementations can find important applications in various fields of physics and engineering.

机译：本文提出了一种在图形处理单元（GPU）上实现的并行算法，用于快速评估亥姆霍兹势能与大规模源分布之间的空间卷积。该算法实现了非均匀网格插值方法（NGIM），该方法使用幅度和相位补偿以及来自稀疏网格的空间插值来计算源域之外的场。 NGIM减少了N个观察者进行直接场评估的计算时间成本，这是因为N个并置源从静态和低频体制中的O（N〜2）到O（N），到高频的O（NlogN）频率方案，以及这些费用之间的混合频率方案。在所有频率范围内，内存需求均按O（N）缩放。需要NGIM的CPU和GPU实施之间的几个重要区别，才能在各自的平台上实现最佳性能。特别是，在CPU实现中，所有操作（如果可能）都在预处理阶段进行了预先计算并存储在内存中。这减少了计算时间，但显着增加了内存消耗。在GPU的实现中，处理内存通常是一个关键的瓶颈，因此使用了几种特殊的内存处理技术来加速计算。通过执行合并读取，可以隐藏GPU全局内存访问的显着延迟，这需要在内存的连续部分中安排许多数组元素。与CPU版本相反，GPU实现中的大多数步骤都是即时执行的，只有必要的数组才会保留在内存中。这样可以显着减少内存消耗，增加可以处理的问题大小N，并减少GPU上的计算时间。根据所需的精度和问题大小，获得的GPU-CPU加速比为150至400。所提出的方法及其CPU和GPU实现可以在物理和工程学的各个领域中找到重要的应用程序。

著录项

来源
《Journal of Computational Physics》 |2010年第22期|共21页
作者
Li S.; Livshitz B.; Lomakin V.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类数学物理方法;
关键词
Computational electromagnetics; Fast methods; Graphics processing units (GPUs); Integral equations; Non-uniform grid interpolation methods;

机译：计算电磁学;快速方法;图形处理单元（GPU）;积分方程;非均匀网格插值方法;

相似文献

外文文献
中文文献
专利

1. Fast evaluation of Helmholtz potential on graphics processing units (GPUs) [J] . Li S., Livshitz B., Lomakin V. Journal of Computational Physics . 2010,第22期

机译：在图形处理单元（GPU）上快速评估亥姆霍兹电位
2. Faster, more accurate, parallelized inversion for shape optimization in electroheat problems on a graphics processing unit (GPU) with the real-coded genetic algorithm [J] . Victor U. Karthik, Sivamayam Sivasuthan, Arunasalam Rahunanthan, Compel . 2015,第1期

机译：更快，更准确，并行化的逆运算，用于通过实编码遗传算法在图形处理单元（GPU）上优化电热问题中的形状
3. Visualizing 3D/4D environmental data using many-core graphics processing units (GPUs) and multi-core central processing units (CPUs) [J] . Jing Li, Yunfeng Jiang, Chaowei Yang, Computers & geosciences . 2013,第SEPa期

机译：使用多核图形处理单元（GPU）和多核中央处理单元（CPU）可视化3D / 4D环境数据
4. Fast analytical modeling of compton scatter using point clouds and graphics processing unit (GPU) [C] . Arkadiusz Sitek, Georges El Fakhri, Jinsong Ouyang, IEEE Nuclear Science Symposium . 2007

机译：使用点云和图形处理单元的Compton散射的快速分析建模（GPU）
5. High performance multiscale image processing framework on multi-GPUs (graphics processing units) with applications to unbiased diffeomorphic atlas construction. [D] . Ha, Linh Khanh. 2011

机译：多GPU（图形处理单元）上的高性能多尺度图像处理框架，可应用于无偏微晶图集构造。
6. GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units [O] . Pooya Zandevakili, Ming Hu, Zhaohui Qin 2009

机译：GPUmotif：使用图形处理单元的超快速节能型母题分析程序
7. Evaluating Mobile Graphics Processing Units (GPUs) for Real-Time Resource Constrained Applications [O] . Meredith, J, Conger, J, Liu, Y, 2005

机译：评估实时资源受限应用程序的移动图形处理单元（GpU）
8. Evaluating Mobile Graphics Processing Units (GPUs) for Real-Time Resource Constrained Applications [R] . Meredith, J., Conger, J., Liu, Y., 2005

机译：评估实时资源受限应用程序的移动图形处理单元（GpU）

Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

摘要

著录项

相似文献

相关主题

期刊订阅