首页> 外文OA文献 >Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)
【2h】

Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs)

机译:异构体系结构(GPU /多核CPU)上的并行球形谐波变换

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical applications ranging from climate modelling to cosmological observations. In many of these areas new, cutting-edge science goals have been recently proposed requiring simulations and analyses of experimental or observational data at very high resolutions and of unprecedented volumes. Both these aspects pose formidable challenge for the currently existing implementations of the transforms. This paper describes parallel algorithms for computing the SHTs with two variants of intra-node parallelism appropriate for novel supercomputer architectures, multi-core processors and Graphic Processing Units (GPU) and discusses their performance tests, alone and embedded within a top-level, MPI-based parallelization layer ported from the S$^2$HAT library, in terms of their accuracy, overall efficiency and scalability. We show that our inverse SHTs with GeForce 400 Series GPUs equipped with latest CUDA architecture ("Fermi") outperforms the state of the art implementation for a multi-core processor executed on a current Intel Core i7-2600K. Furthermore, we show that an MPI/CUDA version of the inverse transform run on a cluster of 128 NVIDIA Tesla S1070 is as much as 3 times faster than the hybrid MPI/OpenMP version executed on the same number of quad-core processors Intel Nahalem for problem sizes motivated by our target applications. For the direct transforms, the performance is however found to be at the best comparable. Here we discuss in detail optimizations of two major steps involved in the transforms calculation, demonstrating how the overall performance efficiency can be obtained, and elucidating the sources of the dichotomy between the direct and the inverse operations
机译:球谐变换(SHT)是从气候模型到宇宙学观测等许多科学和实际应用的核心。在许多这些领域中,最近提出了新的,前沿的科学目标,这些目标要求以非常高的分辨率和空前的数量来模拟和分析实验或观测数据。这两个方面都对转换的当前现有实现提出了巨大的挑战。本文介绍了适用于新型超级计算机体系结构,多核处理器和图形处理单元(GPU)的两种内部节点变体形式的并行算法,用于计算SHT,并讨论了它们的性能测试,单独进行并嵌入顶层MPI中从S $ ^ 2 $ HAT库移植的基于并行的层,从准确性,总体效率和可伸缩性方面考虑。我们显示,配备最新CUDA架构(“ Fermi”)的配备GeForce 400系列GPU的逆向SHT优于在当前Intel Core i7-2600K上执行的多核处理器的最新实现。此外,我们显示,在128个NVIDIA Tesla S1070群集上运行的逆变换的MPI / CUDA版本比在相同数量的四核处理器Intel Nahalem上执行的混合MPI / OpenMP版本快3倍之多。目标应用程序所激发的问题大小。但是,对于直接变换,性能是最好的。在这里,我们详细讨论了转换计算中涉及的两个主要步骤的优化,展示了如何获得整体性能效率,并阐明了直接运算和逆运算之间二分法的来源

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号