...
首页> 外文期刊>Journal of Computational Physics >Fast evaluation of Helmholtz potential on graphics processing units (GPUs)
【24h】

Fast evaluation of Helmholtz potential on graphics processing units (GPUs)

机译:在图形处理单元(GPU)上快速评估亥姆霍兹电位

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper presents a parallel algorithm implemented on graphics processing units (GPUs) for rapidly evaluating spatial convolutions between the Helmholtz potential and a large-scale source distribution. The algorithm implements a non-uniform grid interpolation method (NGIM), which uses amplitude and phase compensation and spatial interpolation from a sparse grid to compute the field outside a source domain. NGIM reduces the computational time cost of the direct field evaluation at N observers due to N co-located sources from O(N~2) to O(N) in the static and low-frequency regimes, to O(NlogN) in the high-frequency regime, and between these costs in the mixed-frequency regime. Memory requirements scale as O(N) in all frequency regimes. Several important differences between CPU and GPU implementations of the NGIM are required to result in optimal performance on respective platforms. In particular, in the CPU implementations all operations, where possible, are pre-computed and stored in memory in a preprocessing stage. This reduces the computational time but significantly increases the memory consumption. In the GPU implementations, where handling memory often is a critical bottle neck, several special memory handling techniques are used to accelerate the computations. A significant latency of the GPU global memory access is hidden by implementing coalesced reading, which requires arranging many array elements in contiguous parts of memory. Contrary to the CPU version, most of the steps in the GPU implementations are executed on-fly and only necessary arrays are kept in memory. This results in significantly reduced memory consumption, increased problem size N that can be handled, and reduced computational time on GPUs. The obtained GPU-CPU speed-up ratios are from 150 to 400 depending on the required accuracy and problem size. The presented method and its CPU and GPU implementations can find important applications in various fields of physics and engineering.
机译:本文提出了一种在图形处理单元(GPU)上实现的并行算法,用于快速评估亥姆霍兹势能与大规模源分布之间的空间卷积。该算法实现了非均匀网格插值方法(NGIM),该方法使用幅度和相位补偿以及来自稀疏网格的空间插值来计算源域之外的场。 NGIM减少了N个观察者进行直接场评估的计算时间成本,这是因为N个并置源从静态和低频体制中的O(N〜2)到O(N),到高频的O(NlogN)频率方案,以及这些费用之间的混合频率方案。在所有频率范围内,内存需求均按O(N)缩放。需要NGIM的CPU和GPU实施之间的几个重要区别,才能在各自的平台上实现最佳性能。特别是,在CPU实现中,所有操作(如果可能)都在预处理阶段进行了预先计算并存储在内存中。这减少了计算时间,但显着增加了内存消耗。在GPU的实现中,处理内存通常是一个关键的瓶颈,因此使用了几种特殊的内存处理技术来加速计算。通过执行合并读取,可以隐藏GPU全局内存访问的显着延迟,这需要在内存的连续部分中安排许多数组元素。与CPU版本相反,GPU实现中的大多数步骤都是即时执行的,只有必要的数组才会保留在内存中。这样可以显着减少内存消耗,增加可以处理的问题大小N,并减少GPU上的计算时间。根据所需的精度和问题大小,获得的GPU-CPU加速比为150至400。所提出的方法及其CPU和GPU实现可以在物理和工程学的各个领域中找到重要的应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号