首页> 外文会议>International Workshop on OpenMP >Evaluating OpenMP Affinity on the POWER8 Architecture
【24h】

Evaluating OpenMP Affinity on the POWER8 Architecture

机译:在POWER8架构上评估OpenMP亲和力

获取原文

摘要

As we move toward pre-Exascale systems, two of the DOE leadership class systems will consist of very powerful OpenPOWER compute nodes which will be more complex to program. These systems will have massive amounts of parallelism; where threads may be running on POWER9 cores as well as on accelerators. Advances in memory interconnects, such as NVLINK, will provide a unified shared memory address spaces for different types of memories HBM, DRAM, etc. In preparation for such system, we need to improve our understanding on how OpenMP supports the concept of affinity as well as memory placement on POWER8 systems. Data locality and affinity are key program optimizations to exploit the compute and memory capabilities to achieve good performance by minimizing data motion across NUMA domains and access the cache efficiently. This paper is the first step to evaluate the current features of OpenMP 4.0 on the POWER8 processors, and on how to measure its effects on a system with two POWER8 sockets. We experiment with the different affinity settings provided by OpenMP 4.0 to quantify the costs of having good data locality vs not, and measure their effects via hardware counters. We also find out which affinity settings benefits more from data locality. Based on this study we describe the current state of art, the challenges we faced in quantifying effects of affinity, and ideas on how OpenMP 5.0 should be improved to address affinity in the context of NUMA domains and accelerators.
机译:随着我们朝着Exascale之前的系统发展,DOE的两个领导级别系统将由功能非常强大的OpenPOWER计算节点组成,这将使编程更加复杂。这些系统将具有大量的并行性。线程可能在POWER9内核以及加速器上运行。内存互连(例如NVLINK)的进步将为不同类型的存储器HBM,DRAM等提供统一的共享内存地址空间。在为此类系统做准备时,我们还需要提高对OpenMP如何支持相似性概念的理解。作为POWER8系统上的内存放置。数据局部性和关联性是关键的程序优化,可通过最大程度地减少跨NUMA域的数据移动并有效访问缓存来利用计算和内存功能来获得良好的性能。本文是评估OpenMP 4.0在POWER8处理器上的当前功能以及如何评估其对具有两个POWER8插槽的系统的影响的第一步。我们使用OpenMP 4.0提供的不同的相似性设置进行实验,以量化拥有良好数据局部性与不具有良好数据局部性的成本,并通过硬件计数器测量其影响。我们还将找出哪些亲和力设置会从数据局部性中受益更多。基于这项研究,我们描述了当前的技术水平,量化亲和力影响时所面临的挑战,以及有关在NUMA域和加速器的上下文中应如何改进OpenMP 5.0来解决亲和力的想法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号