首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Redesigning Peridigm on SIMT Accelerators for High-performance Peridynamics Simulations

【24h】

Redesigning Peridigm on SIMT Accelerators for High-performance Peridynamics Simulations

机译：在SIMT加速器上重新设计Perivigm，用于高性能王室仿真模拟

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Peridigm is one of the most frequently utilized Peridynamics (PD) simulation software for problems involving discontinuity, such as cracks and fragmentation. However, performing long-term and large-scale simulations is very time-consuming for Peridigm. To enhance the performance and scalability of Peridigm, we port and optimize Peridigm on the SIMT accelerators. Challenges are imposed on efficient Peridigm on the SIMT architecture by the complex calculations and massive memory access of PD simulations. In this study, a series of strategies and techniques are proposed to optimize the performance of Peridigm. We first adjust the algorithms of bond-based calculations to eliminate the data conflicts with minimized overhead in order to achieve parallel Peridigm on accelerators. Furthermore, we propose thread grouping and collaborative memory access strategies to decrease the overhead of data fetch from device memory. To improve the efficiency of calculations, we also refine the calculation instructions. Finally, we offer a transmission-computation overlapping strategy for reducing the overhead brought by the data transmissions and improving the scalability. The optimized Peridigm on 4 Nvidia Tesla V100 GPUs accelerates the basic parallel Peridigm on 4 V100 GPUs 10.24 times. Compared to the original Peridigm run on 8 Intel Xeon Gold 6248 CPUs (160 cores, 320 threads) and the optimized PD application run on 4 SW26010 processors (1,040 cores), our work on 4 V100 GPUs accelerates the simulation 9 times and 4 times respectively. As for large-scale simulations, because we don’t have enough V100 GPUs, we run our work on noncommercial SIMT accelerators which have similar performance to the V100 of the PCIe version, with the example scales from 282,000 points to 36,096,000 points and the number of accelerators scales from 4 to 512, near-linear scalability is observed and the performance ultimately reaching 825.72 TFLOPS with 98.81% parallel efficiency

机译：PeriDigm是最常用的白角动脉（PD）仿真软件之一，涉及不连续性的问题，例如裂缝和碎片。然而，执行长期和大规模模拟对于平凡来说是非常耗时的。为了提高普遍的性能和可扩展性，我们在SIMT加速器上倾向并优化Peridigm。通过复杂的计算和PD仿真的大规模内存访问对SIMT架构上的高效跨度施加挑战。在这项研究中，提出了一系列策略和技术来优化普遍性的性能。我们首先调整基于键的计算的算法，以消除具有最小化开销的数据冲突，以便在加速器上实现平行云彩。此外，我们提出了线程分组和协作内存访问策略，以减少设备存储器的数据的开销。为了提高计算效率，我们还会改进计算指令。最后，我们提供传输 - 计算重叠策略，用于减少数据传输带来的开销并提高可扩展性。 ON 4 NVIDIA TESLA V100 GPU的优化普遍百年GPU在4 V100 GPU上加速了基本的平行百年跨越10.24次。与原来的PeriDigm运行8 Intel Xeon Gold 6248 CPU（160核心，320个线程）和优化的PD应用程序在4个SW26010处理器（1,040个核心）上运行，我们在4 V100 GPU上的工作分别加速了模拟9次和4次。至于大规模模拟，因为我们没有足够的V100 GPU，我们在非商业SIMT加速器上运行了与PCIe版本的V100类似的性能，其中示例量表从282,000点到36,096,000点和数字加速器尺度从4到512，观察到近线性可扩展性，并且性能最终达到825.72吨，平行效率为98.81％

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium 》|2021年|433-443|共11页
会议地点
作者
Xinyuan Li; Huang Ye; Jian Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Performance evaluation; Ports (computers); Scalability; Instruction sets; Memory management; Collaboration; Software;

机译：性能评估;端口（计算机）;可伸缩性;指令集;内存管理;协作;软件;

相似文献

外文文献
中文文献
专利

1. OpenCL implementation of a high performance 3D Peridynamic model on graphics accelerators [J] . Mossaiby F., Shojaei A., Zaccariotto M., Computers & mathematics with applications . 2017 ,第8期

机译：在图形加速器上的高性能3D Peridynamic模型的OpenCL实现
2. Fluid-elastic structure interaction simulation by using ordinary state-based peridynamics and peridynamic differential operator [J] . Yan Gao, Selda Oterkus Engineering analysis with boundary elements . 2020 ,第Deca期

机译：流体弹性结构相互作用模拟使用普通状态性斜度和白动力学差动算子
3. Ring-mesh: a scalable and high-performance approach for manycore accelerators [J] . Mazumdar Somnath, Scionti Alberto Journal of supercomputing . 2020 ,第9期

机译：Ring-Mesh：Manycore加速器的可扩展和高性能的方法
4. Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors [C] . Benjamin Klenk, Holger Fröening, Hans Eberle, IEEE International Parallel and Distributed Processing Symposium . 2017

机译：大规模并行SIMT处理器上高性能消息传递的放松
5. Neutron exposure from electrom linear accelerators and a proton accelerator: Measurements and simulations. [D] . Chen, Kuan Ling. 2011

机译：电线性加速器和质子加速器的中子暴露：测量和模拟。
6. Fast Acceleration of 2D Wave Propagation Simulations Using Modern Computational Accelerators [O] . Wei Wang, Lifan Xu, John Cavazos, -1

机译：使用现代计算加速器快速加速2D波传播仿真
7. High-performance SIMT code generation in an active visual effects library [O] . Jay L. T. Cornwall, Lee Howes, Paul H. J. Kelly, 2009

机译：在活动视觉效果库中生成高性能sImT代码

Redesigning Peridigm on SIMT Accelerators for High-performance Peridynamics Simulations

摘要

著录项

相似文献

相关主题

期刊订阅