首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Redesigning Peridigm on SIMT Accelerators for High-performance Peridynamics Simulations
【24h】

Redesigning Peridigm on SIMT Accelerators for High-performance Peridynamics Simulations

机译:在SIMT加速器上重新设计Perivigm,用于高性能王室仿真模拟

获取原文

摘要

Peridigm is one of the most frequently utilized Peridynamics (PD) simulation software for problems involving discontinuity, such as cracks and fragmentation. However, performing long-term and large-scale simulations is very time-consuming for Peridigm. To enhance the performance and scalability of Peridigm, we port and optimize Peridigm on the SIMT accelerators. Challenges are imposed on efficient Peridigm on the SIMT architecture by the complex calculations and massive memory access of PD simulations. In this study, a series of strategies and techniques are proposed to optimize the performance of Peridigm. We first adjust the algorithms of bond-based calculations to eliminate the data conflicts with minimized overhead in order to achieve parallel Peridigm on accelerators. Furthermore, we propose thread grouping and collaborative memory access strategies to decrease the overhead of data fetch from device memory. To improve the efficiency of calculations, we also refine the calculation instructions. Finally, we offer a transmission-computation overlapping strategy for reducing the overhead brought by the data transmissions and improving the scalability. The optimized Peridigm on 4 Nvidia Tesla V100 GPUs accelerates the basic parallel Peridigm on 4 V100 GPUs 10.24 times. Compared to the original Peridigm run on 8 Intel Xeon Gold 6248 CPUs (160 cores, 320 threads) and the optimized PD application run on 4 SW26010 processors (1,040 cores), our work on 4 V100 GPUs accelerates the simulation 9 times and 4 times respectively. As for large-scale simulations, because we don’t have enough V100 GPUs, we run our work on noncommercial SIMT accelerators which have similar performance to the V100 of the PCIe version, with the example scales from 282,000 points to 36,096,000 points and the number of accelerators scales from 4 to 512, near-linear scalability is observed and the performance ultimately reaching 825.72 TFLOPS with 98.81% parallel efficiency
机译:PeriDigm是最常用的白角动脉(PD)仿真软件之一,涉及不连续性的问题,例如裂缝和碎片。然而,执行长期和大规模模拟对于平凡来说是非常耗时的。为了提高普遍的性能和可扩展性,我们在SIMT加速器上倾向并优化Peridigm。通过复杂的计算和PD仿真的大规模内存访问对SIMT架构上的高效跨度施加挑战。在这项研究中,提出了一系列策略和技术来优化普遍性的性能。我们首先调整基于键的计算的算法,以消除具有最小化开销的数据冲突,以便在加速器上实现平行云彩。此外,我们提出了线程分组和协作内存访问策略,以减少设备存储器的数据的开销。为了提高计算效率,我们还会改进计算指令。最后,我们提供传输 - 计算重叠策略,用于减少数据传输带来的开销并提高可扩展性。 ON 4 NVIDIA TESLA V100 GPU的优化普遍百年GPU在4 V100 GPU上加速了基本的平行百年跨越10.24次。与原来的PeriDigm运行8 Intel Xeon Gold 6248 CPU(160核心,320个线程)和优化的PD应用程序在4个SW26010处理器(1,040个核心)上运行,我们在4 V100 GPU上的工作分别加速了模拟9次和4次。至于大规模模拟,因为我们没有足够的V100 GPU,我们在非商业SIMT加速器上运行了与PCIe版本的V100类似的性能,其中示例量表从282,000点到36,096,000点和数字加速器尺度从4到512,观察到近线性可扩展性,并且性能最终达到825.72吨,平行效率为98.81%

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号