Parallel spherical harmonic transforms on heterogeneous architectures (graphics processing units/multi-core CPUs)

Mikolaj Szydlarski; Pierre Esterie; Joel Falcou; Laura Grigori; Radek Stompor

首页> 外文期刊>Concurrency and Computation >Parallel spherical harmonic transforms on heterogeneous architectures (graphics processing units/multi-core CPUs)

【24h】

Parallel spherical harmonic transforms on heterogeneous architectures (graphics processing units/multi-core CPUs)

机译：异构体系结构（图形处理单元/多核CPU）上的并行球形谐波变换

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Spherical harmonic transforms (SHT) are at the heart of many scientific and practical applications rangingrnfrom climate modelling to cosmological observations. In many of these areas, new cutting-edge sciencerngoals have been recently proposed requiring simulations and analyses of experimental or observational datarnat very high resolutions and of unprecedented volumes. Both these aspects pose formidable challenge forrnthe currently existing implementations of the transforms. This paper describes parallel algorithms for computingrnSHT with two variants of intra-node parallelism appropriate for novel supercomputer architectures,rnmulti-core processors and Graphic Processing Units (GPU). It also discusses their performance, alone andrnembedded within a top-level, Message Passing Interface-based parallelisation layer ported from the S~2HATrnlibrary, in terms of their accuracy, overall efficiency and scalability. We show that our inverse SHT run onrnGeForce 400 Series GPUs equipped with latest Compute Unified Device Architecture architecture (Fermi)rnoutperforms the state of the art implementation for a multi-core processor executed on a current Intel Corerni7-2600K. Furthermore, we show that an Message Passing Interface/Compute Unified Device Architecturernversion of the inverse transform run on a cluster of 128 Nvidia Tesla S1070 is as much as 3 times fasterrnthan the hybrid Message Passing Interface/OpenMP version executed on the same number of quad-corernprocessors Intel Nehalem for problem sizes motivated by our target applications. Performance of the directrntransforms is however found to be at the best comparable in these cases.We discuss in detail the algorithmicrnsolutions devised for the major steps involved in the transforms calculation, emphasising those with a majorrnimpact on their overall performance and elucidates the sources of the dichotomy between the direct and therninverse operations.

机译：球谐变换（SHT）是从气候模型到宇宙观测的许多科学和实际应用的核心。在许多这些领域中，最近已经提出了新的尖端科学目标，要求以非常高的分辨率和空前的数量来模拟和分析实验或观测数据。这两个方面对当前现有的转换实现构成了巨大的挑战。本文介绍了适用于新型超级计算机体系结构，多核处理器和图形处理单元（GPU）的两种算法，其中包括节点内并行度的两种变体来计算rnSHT。它还讨论了它们的性能，准确性，总体效率和可伸缩性，这些性能单独地并嵌入在从S〜2HATrn库移植到的基于消息传递接口的顶级并行层中。我们显示，逆SHT在配备最新的Compute Unified设备架构体系结构（Fermi）的rnGeForce 400系列GPU上运行，优于在当前Intel Corerni7-2600K上执行的多核处理器的最新实现。此外，我们显示，在128个Nvidia Tesla S1070群集上运行的逆变换的消息传递接口/计算统一设备体系结构的运行速度比在相同数量的Quad-Team上执行的混合消息传递接口/ OpenMP版本快3倍。核心处理器Intel Nehalem，用于解决由目标应用程序引起的问题。但是，在这些情况下，直接变换的性能是最好的。我们详细讨论了为变换计算中涉及的主要步骤设计的算法解决方案，着重强调了对其总体性能有重大影响的解决方案，并阐明了二分法的来源在直接和逆向运算之间。

著录项

来源
《Concurrency and Computation》 |2014年第3期|683-711|共29页
作者
Mikolaj Szydlarski; Pierre Esterie; Joel Falcou; Laura Grigori; Radek Stompor;
展开▼
作者单位

INRIA Saclay-Île de France, F-91893 Orsay, France;

Université Paris Sud, F-91405 Orsay, France;

Université Paris Sud, F-91405 Orsay, France;

INRIA Rocquencourt, Alpines, B.P. 105, F-78153, Le Chesnay Cedex, FranceUPMC Univ Paris 06, CNRS UMR 7598, Laboratoire Jacques-Louis Lions, F-75005, Paris, France;

APC, Univ Paris Diderot, CNRS/IN2P3, CEA/Irfu, Obs de Paris, Sorbonne Paris Cité, France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
spherical harmonic transforms; hybrid architectures; hybrid programming; CUDA; multi-GPU; CMB;

机译：球谐变换混合架构;混合编程;CUDA;多GPU;招商银行;

相似文献

外文文献
中文文献
专利

1. Visualizing 3D/4D environmental data using many-core graphics processing units (GPUs) and multi-core central processing units (CPUs) [J] . Jing Li, Yunfeng Jiang, Chaowei Yang, Computers & geosciences . 2013,第SEPa期

机译：使用多核图形处理单元（GPU）和多核中央处理单元（CPU）可视化3D / 4D环境数据
2. Speeding up the log-polar transform with inexpensive parallel hardware: graphics units and multi-core architectures [J] . Antonelli Marco, Igual Francisco D., Ramos Francisco, Journal of Real-Time Image Processing . 2015,第3期

机译：使用廉价的并行硬件加速图形对数转换：图形单元和多核体系结构
3. Design Methodology of the Heterogeneous Multi-core Processor With the Combination of Parallelized Multi-core Simulator and Common Register File-Based Instruction Set Extension Architecture [J] . Bingbing Xia, Fei Qiao, Huazhong Yang, Journal of Computers . 2013,第2期

机译：异构多核处理器的设计方法，具有并行化多核模拟器和基于公共寄存器文件指令集扩展架构的组合
4. Acceleration of Tandem Mass Spectrometry Analysis Software CoCoozo using Multi-core CPUs and Graphics Processing Units [C] . Yasufumi Obata, Takashi Ishida, Tohru Natsume, International Conference on Parallel and Distributed Processing Techniques and Applications . 2013

机译：使用多核CPU和图形处理单元加速串联质谱分析软件Cocoozo
5. Exploiting multi-core processors for the service oriented architecture paradigm: Parallel XML processing and concurrent service orchestration. [D] . Lu, Wei. 2009

机译：为面向服务的体系结构范例开发多核处理器：并行XML处理和并发服务编排。
6. A Parallel Architecture for the Partitioning around Medoids (PAM) Algorithm for Scalable Multi-Core Processor Implementation with Applications in Healthcare [O] . Hassan Mushtaq, Sajid Gul Khawaja, Muhammad Usman Akram, 2018

机译：围绕Medoids（PAM）算法进行分区的并行体系结构可实现可扩展的多核处理器及其在医疗保健中的应用
7. Parallel Spherical Harmonic Transforms on heterogeneous architectures (GPUs/multi-core CPUs) [O] . Szydlarski Mikolaj, Esterie Pierre, Falcou Joel, 2012

机译：异构体系结构（GPU /多核CPU）上的并行球形谐波变换

Parallel spherical harmonic transforms on heterogeneous architectures (graphics processing units/multi-core CPUs)

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅