首页> 外文期刊>Concurrency and Computation >Parallel spherical harmonic transforms on heterogeneous architectures (graphics processing units/multi-core CPUs)
【24h】

Parallel spherical harmonic transforms on heterogeneous architectures (graphics processing units/multi-core CPUs)

机译:异构体系结构(图形处理单元/多核CPU)上的并行球形谐波变换

获取原文
获取原文并翻译 | 示例

摘要

Spherical harmonic transforms (SHT) are at the heart of many scientific and practical applications rangingrnfrom climate modelling to cosmological observations. In many of these areas, new cutting-edge sciencerngoals have been recently proposed requiring simulations and analyses of experimental or observational datarnat very high resolutions and of unprecedented volumes. Both these aspects pose formidable challenge forrnthe currently existing implementations of the transforms. This paper describes parallel algorithms for computingrnSHT with two variants of intra-node parallelism appropriate for novel supercomputer architectures,rnmulti-core processors and Graphic Processing Units (GPU). It also discusses their performance, alone andrnembedded within a top-level, Message Passing Interface-based parallelisation layer ported from the S~2HATrnlibrary, in terms of their accuracy, overall efficiency and scalability. We show that our inverse SHT run onrnGeForce 400 Series GPUs equipped with latest Compute Unified Device Architecture architecture (Fermi)rnoutperforms the state of the art implementation for a multi-core processor executed on a current Intel Corerni7-2600K. Furthermore, we show that an Message Passing Interface/Compute Unified Device Architecturernversion of the inverse transform run on a cluster of 128 Nvidia Tesla S1070 is as much as 3 times fasterrnthan the hybrid Message Passing Interface/OpenMP version executed on the same number of quad-corernprocessors Intel Nehalem for problem sizes motivated by our target applications. Performance of the directrntransforms is however found to be at the best comparable in these cases.We discuss in detail the algorithmicrnsolutions devised for the major steps involved in the transforms calculation, emphasising those with a majorrnimpact on their overall performance and elucidates the sources of the dichotomy between the direct and therninverse operations.
机译:球谐变换(SHT)是从气候模型到宇宙观测的许多科学和实际应用的核心。在许多这些领域中,最近已经提出了新的尖端科学目标,要求以非常高的分辨率和空前的数量来模拟和分析实验或观测数据。这两个方面对当前现有的转换实现构成了巨大的挑战。本文介绍了适用于新型超级计算机体系结构,多核处理器和图形处理单元(GPU)的两种算法,其中包括节点内并行度的两种变体来计算rnSHT。它还讨论了它们的性能,准确性,总体效率和可伸缩性,这些性能单独地并嵌入在从S〜2HATrn库移植到的基于消息传递接口的顶级并行层中。我们显示,逆SHT在配备最新的Compute Unified设备架构体系结构(Fermi)的rnGeForce 400系列GPU上运行,优于在当前Intel Corerni7-2600K上执行的多核处理器的最新实现。此外,我们显示,在128个Nvidia Tesla S1070群集上运行的逆变换的消息传递接口/计算统一设备体系结构的运行速度比在相同数量的Quad-Team上执行的混合消息传递接口/ OpenMP版本快3倍。核心处理器Intel Nehalem,用于解决由目标应用程序引起的问题。但是,在这些情况下,直接变换的性能是最好的。我们详细讨论了为变换计算中涉及的主要步骤设计的算法解决方案,着重强调了对其总体性能有重大影响的解决方案,并阐明了二分法的来源在直接和逆向运算之间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号