【24h】

Spherical Harmonic Transform with GPUs

机译:球形谐波变换与GPU

获取原文

摘要

We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a FORTRAN90 routine included in a publicly available parallel package, S~2HAT. We focus our attention on two major sequential steps involved in the transforms computation retaining the efficient parallel framework of the original code. We detail optimization techniques used to enhance the performance of the CUDA-based code and contrast them with those implemented in the FORTRAN90 version. We present performance comparisons of a single CPU plus GPU unit with the S~2HAT code running on either a single or 4 processors. In particular, we find that the latest generation of GPUs, such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms by as much as 18 times with respect to S~2HAT executed on one core, and by as much as 5.5 with respect to S~2HAT on 4 cores, with the overall performance being limited by the Fast Fourier transforms. The work presented here has been performed in the context of the Cosmic Microwave Background simulations and analysis. However, we expect that the developed software will be of more general interest and applicability.
机译:我们描述了一种用于计算适用于图形处理单元(GPU)的反向球形谐波变换的算法。我们使用CUDA并将我们的实施在公共可用的并联包装中包含的Fortran90例程中,S〜2hat。我们将注意力集中在转换计算涉及原始代码的有效并行框架的转换计算中涉及的两个主要顺序步骤。我们详细介绍了用于增强基于CUDA的代码的性能的优化技术,并将其与Fortran90版本中实现的代码进行对比。我们使用单个或4个处理器上运行的S〜2HAT代码呈现单个CPU加GPU单元的性能比较。特别是,我们发现最新一代的GPU,如NVIDIA GF100(费米),可以通过在一个核心上执行的S〜2HAT加速球形谐波变换,并且可以使用多达5.5关于4个核心的S〜2hat,整体性能受到快速傅里叶变换的限制。这里呈现的工作已经在宇宙微波背景模拟和分析的背景下进行。但是,我们预计开发的软件将具有更大的兴趣和适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号