首页> 外文期刊>Parallel Computing >Superlinear speedup phenomenon in parallel 3D Discrete Element Method (DEM) simulations of complex-shaped particles
【24h】

Superlinear speedup phenomenon in parallel 3D Discrete Element Method (DEM) simulations of complex-shaped particles

机译:复杂形状粒子的并行3D离散元方法(DEM)模拟中的超线性加速现象

获取原文
获取原文并翻译 | 示例

摘要

Strong superlinear speedup has been discovered in large scale simulations of parallel 3D DEM for complex-shaped particles, which is based on an algorithm of spatial domain decomposition, and exhibits the "high-CPU-low-memory" characteristics. The interpretation of this phenomenon requires a careful examination of the speedup theory and practice in the field of parallel computing. The superlinear speedup is investigated from three perspectives: (i) memory footprint per process, (ii) cache miss rates of L1, L2 and L3 level caches, and (iii) uniprocessor performance, using a wide range of problem size (across five orders of magnitude of simulation scale regarding number of particles) and number of compute nodes (1-2048 nodes) on DoD supercomputers. The Performance-API (PAPI) is employed in the source code to measure cache miss rate and FLOPS. The strong scaling measurements show that cache miss rate is sensitive to the memory consumption shrinkage per processor, and the last level cache (LLC) contributes most significantly to the strong superlinear speedup among all of the three cache levels, and this is also revealed in the weak scaling measurements. The findings are associated with the inherently perfect scalability of 3D DEM: its memory scalability function is a nonlinearly decreasing function of the number of processors. In addition, a constant (non-increasing) uniprocessor FLOPS performance w.r.t problem size can also contribute to the superlinear speedup.
机译:在复杂形状的粒子的并行3D DEM的大规模仿真中,已经发现了强大的超线性加速,它基于空间域分解算法,并具有“高CPU低内存”特性。对这种现象的解释需要仔细研究并行计算领域中的加速理论和实践。从以下三个方面研究了超线性加速:(i)每个进程的内存占用量;(ii)L1,L2和L3级高速缓存的高速缓存未命中率;以及(iii)使用范围广泛的问题大小(跨五个顺序)的单处理器性能(关于粒子数量)和DoD超级计算机上计算节点数量(1-2048节点)的模拟规模的大小。在源代码中使用Performance-API(PAPI)来度量高速缓存未命中率和FLOPS。强大的扩展度量表明,高速缓存未命中率对每个处理器的内存消耗减少很敏感,而最后一级高速缓存(LLC)在所有三个高速缓存级别中对强超线性加速的贡献最大。弱比例缩放测量。这些发现与3D DEM固有的完美可伸缩性有关:其内存可伸缩性功能是处理器数量的非线性递减功能。此外,恒定(不增加)的单处理器FLOPS性能(包括问题的大小)也可以促进超线性加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号