Implementation of the CPU/GPU hybrid parallel method of characteristics neutron transport calculation using the heterogeneous cluster with dynamic workload assignment

Song Peitao; Zhang Zhijian; Zhang Qian; Liang Liang; Zhao Qiang

首页> 外文期刊>Annals of nuclear energy >Implementation of the CPU/GPU hybrid parallel method of characteristics neutron transport calculation using the heterogeneous cluster with dynamic workload assignment

【24h】

Implementation of the CPU/GPU hybrid parallel method of characteristics neutron transport calculation using the heterogeneous cluster with dynamic workload assignment

机译：使用具有动态工作负载分配的异构集群实现特征中子输运计算的CPU / GPU混合并行方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, graphics processing units (GPUs) have been adopted in many High-Performance Computing (HPC) systems due to their massive computational power and superior energy efficiency. And accelerating CPU-version computational code on heterogeneous clusters with multi-core CPUs and GPUs has attracted a lot of attention. One of the focus on heterogeneous computing is to efficiently take advantage of all computational resources, including both CPU and GPU available on a cluster. In this paper, a heterogeneous MPI + OpenMP/CUDA parallel algorithm for solving the 20 neutron transport equation with the method of characteristic (MOC) is implemented. In this algorithm, the spatial domain decomposition technique provides the coarse-grained parallelism with the MPI protocol while the fine-grained parallelism is exploited through OpenMP (in CPU calculated domain) and CUDA (in GPU calculated domain) based on the ray parallelization. In order to efficiently leverage the computing power of heterogeneous clusters, a dynamic workload assignment scheme is proposed, which is to distribute the workload based on the runtime performance of CPUs and CPUs in the cluster. Moreover, the strong scaling performance of the MPI + CUDA parallelization is studied through a performance analysis model which provides the detailed impact of the degradation in iteration scheme, the load imbalance issue, the data copy between CPUs and GPUs, and the MPI communication in the MPI + CUDA parallel algorithm. And the corresponding conclusion is still tenable for the MPI + OpenMP/CUDA parallelization. The C5G7 2D benchmark and an extended 20 whole-core problem are calculated with MPI + CUDA parallelization, MPI + OpenMP/CUDA parallelization, and the MPI parallelization for comparison. Numerical results demonstrate that the heterogeneous parallel algorithm maintains the desired accuracy. And the dynamic workload assignment scheme can provide the optimal workload assignment which ideally matches the experimental results. In addition, over 11% improvement is observed in MPI + OpenMP/CUDA parallelization compared against the MPI + CUDA parallelization. Moreover, the CPUs/CPUs heterogeneous clusters significantly outperform the CPUs clusters and one heterogeneous node shows basically five times faster than a CPUs node. (C) 2019 Elsevier Ltd. All rights reserved.

机译：近年来，由于图形处理单元（GPU）的强大计算能力和出色的能源效率，已在许多高性能计算（HPC）系统中采用。在具有多核CPU和GPU的异构集群上加速CPU版本的计算代码引起了很多关注。异构计算的重点之一是有效利用所有计算资源，包括群集上可用的CPU和GPU。本文采用特征（MOC）方法，实现了求解20个中子输运方程的MPI + OpenMP / CUDA并行算法。在该算法中，空间域分解技术通过MPI协议提供了粗粒度的并行性，而基于光线并行化的OpenMP（在CPU计算的域中）和CUDA（在GPU计算的域中）则利用了细粒度的并行性。为了有效利用异构集群的计算能力，提出了一种动态工作负载分配方案，该方案将根据集群中CPU和CPU的运行时性能来分配工作负载。此外，通过性能分析模型研究了MPI + CUDA并行化的强大扩展性能，该模型提供了迭代方案降级，负载不平衡问题，CPU和GPU之间的数据复制以及MPI通信中的详细影响。 MPI + CUDA并行算法。对于MPI + OpenMP / CUDA并行化，相应的结论仍然成立。使用MPI + CUDA并行化，MPI + OpenMP / CUDA并行化和MPI并行化来计算C5G7 2D基准测试和扩展的20个全核问题，以进行比较。数值结果表明，异构并行算法保持了所需的精度。动态工作量分配方案可以提供与实验结果理想匹配的最佳工作量分配。此外，与MPI + CUDA并行化相比，MPI + OpenMP / CUDA并行化的改进幅度超过11％。此外，CPU / CPU异构集群的性能明显优于CPU集群，并且一个异构节点的显示速度基本上比CPU节点快五倍。（C）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Annals of nuclear energy》 |2020年第1期|106957.1-106957.12|共12页
作者
Song Peitao; Zhang Zhijian; Zhang Qian; Liang Liang; Zhao Qiang;
展开▼
作者单位

Harbin Engn Univ Coll Nucl Sci & Technol Fundamental Sci Nucl Safety & Simulat Technol Lab Harbin 150001 Heilongjiang Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Heterogeneous computing; Spatial domain decomposition; Ray parallelization; GPU acceleration; MOC; Whole-core neutron transport calculation;

机译：异构计算;空间域分解;射线平行化;GPU加速;MOC;全芯中子输运计算;

相似文献

外文文献
中文文献
专利

1. Dynamic Distribution of Workload between CPU and GPU for a Parallel Conjugate Gradient Method in an Adaptive FEM [J] . Jens Lang, Gudula Rünger Procedia Computer Science . 2013,第1期

机译：自适应有限元中并行共轭梯度法在CPU和GPU之间的工作负载动态分配
2. Parallel Implementation of Vortex Element Method on CPUs and GPUs [J] . Kseniia Kuzmina, Ilia Marchevsky, Victoriya Moreva Procedia Computer Science . 2015,第1期

机译：在CPU和GPU上并行实现Vortex元素方法
3. An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs [J] . Youngsub Ko, Youngmin Yi, Soonhoi Ha Journal of Real-Time Image Processing . 2014,第1期

机译：在由CPU和GPU组成的异构平台上针对x264编码器的高效并行化技术
4. Hybrid Embarrassingly Parallel algorithm for heterogeneous CPU/GPU clusters [C] . Bo Yang, Kai Lu, Jie Liu, 2012 7th International Conference on Computing and Convergence Technology . 2012

机译：适用于异构CPU / GPU集群的混合尴尬并行算法
5. An MPI-CUDA implementation of a model for calcium induced calcium release in a three-dimensional heart cell on a hybrid CPU/GPU cluster [D] . Huang, Xuan 2015

机译：MPI-CUDA模型在混合CPU / GPU集群上的三维心脏细胞中钙诱导的钙释放的模型实现
6. Novel Hybrid GPU–CPU Implementation of Parallelized Monte Carlo Parametric Expectation Maximization Estimation Method for Population Pharmacokinetic Data Analysis [O] . C. M. Ng 2013

机译：人口药代动力学数据分析的并行蒙特卡洛参数期望最大化估计的新型混合GPU-CPU实现
7. Heterogeneous parallel 3D image deconvolution on a cluster of GPUs and CPUs [O] . 2011

机译：在GPU和CPU群集中的异构并行3D图像解卷

Implementation of the CPU/GPU hybrid parallel method of characteristics neutron transport calculation using the heterogeneous cluster with dynamic workload assignment

摘要

著录项

相似文献

相关主题

期刊订阅