首页> 外文会议>International Conference ISC High Performance: International Conference on High Performance Computing >Shared-Memory Parallel Probabilistic Graphical Modeling Optimization:Comparison of Threads, OpenMP,and Data-Parallel Primitives
【24h】

Shared-Memory Parallel Probabilistic Graphical Modeling Optimization:Comparison of Threads, OpenMP,and Data-Parallel Primitives

机译:共享内存并行概率图形建模优化:线程,OpenMP和数据并行基元的比较

获取原文

摘要

This work examines performance characteristics of multiple shared-memory implementations of a probabilistic graphical modeling (PGM) optimization code, which forms the basis for an advanced, state-of-the art image segmentation method. The work is motivated by the need to accelerate scientific image analysis pipelines in use by experimental science, such as at x-ray light sources, and is motivated by the need for platform-portable codes that perform well across many different computational architectures. The primary focus of this work and its main contribution is an in-depth study of shared-memory parallel performance of different implementations, which include those using alternative parallelization approaches such as C11-threads, OpenMP, and data parallel primitives (DPPs). Our results show that, for this complex data-intensive algorithm, the DPP implementation exhibits better runtime performance, but also exhibits less favorable scaling characteristics than the C11-threads and OpenMP counterparts. Based upon a set of experiments that collect hardware performance counters on multiple platforms, the reason for the runtime performance difference appears to be due primarily to algorithmic efficiency gains: the reformulation from the traditional C11-threads and OpenMP expression of the solution into that of data parallel primitives results in significantly fewer instructions being executed. This study is the first of its type to do performance analysis using hardware counters for comparing methods based on VTK-m-based data-parallel primitives with those based on more traditional OpenMP or threads-based parallelism. It is timely, as there is increasing awareness of the need for platform portability in light of increasing node-level parallelism and increasing device heterogeneity.
机译:这项工作检查了概率图形建模(PGM)优化代码的多个共享内存实现的性能特征,这些代码为先进的最新图像分割方法奠定了基础。这项工作的动机是需要加速实验科学(例如X射线光源)使用的科学图像分析流程,并且需要在许多不同的计算体系结构中表现良好的平台可移植代码。这项工作的主要重点及其主要贡献是深入研究了不同实现的共享内存并行性能,其中包括使用替代并行化方法(例如C11线程,OpenMP和数据并行原语(DPP))的共享性能。我们的结果表明,对于这种复杂的数据密集型算法,DPP实现具有更好的运行时性能,但与C11线程和OpenMP对应项相比,其缩放特性也较差。基于收集多个平台上的硬件性能计数器的一组实验,运行时性能差异的原因似乎主要是由于算法效率的提高:从传统的C11线程和解决方案的OpenMP表达式到数据表示的重新构造并行原语导致执行的指令明显减少。这项研究是首次使用硬件计数器进行性能分析,以将基于基于VTK-m的数据并行原语的方法与基于更传统的OpenMP或基于线程的并行性的方法进行比较。这是及时的,因为随着节点级并行性的提高和设备异构性的提高,人们越来越意识到对平台可移植性的需求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号