Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

Rostrup S.; De Sterck H.

首页> 外文期刊>Computer physics communications >Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

【24h】

Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

机译：集群上的双曲线并行PDE仿真：单元与GPU

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications.

机译：高性能计算越来越多地期望数据并行计算设备能够提高计算性能。 IBM的Cell Processor和NVIDIA的用于图形处理单元（GPU）计算的CUDA编程模型已受到广泛关注。在本文中，我们研究了在具有Cell和GPU后端的群集上具有显式时间积分的结构化网格上并行双曲型偏微分方程仿真的加速。消息传递接口（MPI）用于以最粗糙的并行度在节点之间进行通信。根据数据布局，数据流和数据并行指令，描述了数据并行设备在几个更好的并行度级别上对仿真代码的优化。将优化的单元和GPU性能与单个x86中央处理器（CPU）内核上单精度和双精度的参考代码性能进行比较。我们还将基于芯片对芯片比较CPU，Cell和GPU平台，并在共享内存配置（无MPI）中比较具有两个CPU，两个Cell处理器或两个GPU的单个群集节点上的性能。最后，我们使用MPI比较具有32个CPU，32个单元处理器和32个GPU的群集的性能。我们的GPU集群结果使用具有GT200架构的NVIDIA Tesla GPU，但其中也包括最近推出的具有下一代Fermi架构的NVIDIA GPU的一些初步结果。本文为正在考虑将其代码移植到加速器环境的计算科学家和工程师提供了有关如何针对Cell和GPU加速器的集群优化基于结构化网格的显式算法的见解。它还提供了对于此类应用程序的当前和将来的加速器体系结构可能获得的加速的见解。

著录项

来源
《Computer physics communications》 |2010年第12期|共16页
作者
Rostrup S.; De Sterck H.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机的应用;
关键词
Cell processor; Code optimization; GPU; Hyperbolic system; Parallel performance;

机译：单元处理器;代码优化;GPU;双曲系统;并行性能;
入库时间 2022-08-18 09:39:17

相似文献

外文文献
中文文献
专利

1. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU [J] . Rostrup S., De Sterck H. Computer physics communications . 2010,第12期

机译：集群上的双曲线并行PDE仿真：单元与GPU
2. Toward a Multi-level Parallel Framework on GPU Cluster with PetSC-CUDA for PDE-based Optical Flow Computation [J] . S. Cuomo, A. Galletti, G. Giunta, Procedia Computer Science . 2015,第1期

机译：面向具有PetSC-CUDA的GPU集群上的多级并行框架，以进行基于PDE的光流计算
3. Clusters versus GPUs for parallel target and anomaly detection in hyperspectral images [J] . Plaza A., Paz A. EURASIP journal on advances in signal processing . 2010,第25期

机译：集群与GPU的对比，可在高光谱图像中进行并行目标和异常检测
4. Hybrid MPI-Cell Parallelism for Hyperbolic PDE Simulation on a Cell Processor Cluster [C] . Scott Rostrup, Hans De Sterck International Symposium on High Performance Computing Systems and Applications . 2010

机译：细胞处理器集群对双曲线PDE仿真的混合MPI细胞并行性
5. Gpu Parallelization of Replica Exchange Monte Carlo Simulation and Application to Protein Structure Prediction [D] . ?MacCarthy, Elijah Akwafo 2020

机译：复制品Exchange Monte Carlo仿真和应用于蛋白质结构预测的GPU并行化
6. Dynamic parallelism for synaptic updating in GPU-accelerated spiking neural network simulations [O] . Bahadir Kasap, A. John van Opstal -1

机译：GPU加速尖峰神经网络仿真中用于突触更新的动态并行性
7. Parallel Hyperbolic PDE Simulation on Clusters: Cell versus GPU [O] . Scott Rostrup, Hans De Sterck 2010

机译：集群上的并行双曲PDE仿真：单元与GPU
8. Optimized Parallel Discrete Event Simulation (PDES) for High Performance Computing (HPC) Clusters [R] . Abu-Ghazaleh, N. 2005

机译：用于高性能计算（HpC）群集的优化并行离散事件仿真（pDEs）

Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

摘要

著录项

相似文献

相关主题

期刊订阅