首页> 外文会议>International Conference on High Performance Computing >A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application
【24h】

A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application

机译:使用计算绑定应用的现代并行编程模型的性能分析

获取原文
获取外文期刊封面目录资料

摘要

Performance portability is becoming more-and-more important as next-generation high performance computing systems grow increasingly diverse and heterogeneous. Several new approaches to parallel programming, such as SYCL and Kokkos, have been developed in recent years to tackle this challenge. While several studies have been published evaluating these new programming models, they have tended to focus on memory-bandwidth bound applications. In this paper we analyse the performance of what appear to be the most promising modern parallel programming models, on a diverse range of contemporary high-performance hardware, using a compute-bound molecular docking mini-app. We present miniBUDE, a mini-app for BUDE, the Bristol University Docking Engine, a real application routinely used for drug discovery. We benchmark miniBUDE on real-world inputs for the full-scale application in order to follow its performance profile closely in the mini-app. We implement the mini-app in different programming models targeting both CPUs and GPUs, including SYCL and Kokkos, two of the more promising and widely used modern parallel programming models. We then present an analysis of the performance of each implementation, which we compare to highly optimised baselines set using established programming models such as OpenMP, OpenCL, and CUDA. Our study includes a wide variety of modern hardware platforms covering CPUs based on ×86 and Arm architectures, as well as GPUs. We found that, with the emerging parallel programming models, we could achieve performance comparable to that of the established models, and that a higher-level framework such as SYCL can achieve OpenMP levels of performance while aiding productivity. We identify a set of key challenges and pitfalls to take into account when adopting these emerging programming models, some of which are implementation-specific effects and not fundamental design errors that would prevent further adoption. Finally, we discuss our findings in the wider context of performance-portable compute-bound workloads.
机译:由于下一代高性能计算系统变得越来越多样化和异质,性能可移植性变得越来越重要。近年来近年来开发了几种并行编程的新方法,例如Sycl和Kokkos,以解决这一挑战。虽然已经发布了几项研究评估了这些新的编程模型,但它们倾向于专注于内存带宽绑定应用程序。在本文中,我们分析了似乎是最有前途的现代平行编程模型的性能,在各种当代高性能硬件上,使用计算结束的分子对接Mini-App。我们展示了小型化,一个迷你应用程序,为Bude,布里斯托尔大学对接发动机,一个真正用于药物发现的真正应用。我们对全面申请的实际投入的小型努力,以便在迷你应用程序中密切关注其性能概况。我们在针对CPU和GPU的不同编程模型中实施迷你应用程序,包括Sycl和Kokkos,其中两个更有前景和广泛使用的现代平行编程模型。然后,我们对每个实现的性能进行了分析,我们与使用既定编程模型(如OpenMP,OpenCL和CUDA)的高度优化的基线进行了比较。我们的研究包括基于×86和ARM架构以及GPU的CPU覆盖CPU的各种现代硬件平台。我们发现,通过新兴并联编程模型,我们可以实现与已建立的模型相当的性能,并且在辅助生产率的同时可以实现诸如Sycl等更高级别的框架。我们确定采用这些新兴编程模型时要考虑的一组关键挑战和陷阱,其中一些是实现特定的效果,而不是妨碍进一步采用的基本设计错误。最后,我们在更广泛的性能 - 便携式计算绑定工作负载中讨论我们的调查结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号