首页> 外文会议>International Workshop on OpenMP >Multiple Target Task Sharing Support for the OpenMP Accelerator Model
【24h】

Multiple Target Task Sharing Support for the OpenMP Accelerator Model

机译:OpenMP Accelerator模型的多目标任务共享支持

获取原文

摘要

The use of GPU accelerators is becoming common in HPC platforms due to the their effective performance and energy efficiency. In addition, new generations of multicore processors are being designed with wider vector units and/or larger hardware thread counts, also contributing to the peak performance of the whole system. Although current directive-based paradigms, such as OpenMP or OpenACC, support both accelerators and multicore-based hosts, they do not provide an effective and efficient way to concurrently use them, usually resulting in accelerated programs in which the potential computational performance of the host is not exploited. In this paper we propose an extension to the OpenMP 4.5 directive-based programming model to support the specification and execution of multiple instances of task regions on different devices (i.e. accelerators in conjunction with the vector and heavily multithreaded capabilities in multicore processors). The compiler is responsible for the generation of device-specific code for each device kind, delegating to the runtime system the dynamic schedule of the tasks to the available devices. The new proposed clause conveys useful insight to guide the scheduler while keeping a clean, abstract and machine independent programmer interface. The potential of the proposal is analyzed in a prototype implementation in the OmpSs compiler and runtime infrastructure. Performance evaluation is done using three kernels (N-Body, tiled matrix multiply and Stream) on different GPU-capable systems based on ARM, Intel x86 and IBM Power8. From the evaluation we observe speed-ups in the 8-20% range compared to versions in which only the GPU is used, reaching 96 % of the additional peak performance thanks to the reduction of data transfers and the benefits introduced by the OmpSs NUMA-aware scheduler.
机译:由于GPU加速器的有效性能和能效,它们在HPC平台中变得越来越普遍。另外,正在设计具有更多矢量单元和/或更大硬件线程数的新一代多核处理器,这也有助于提高整个系统的性能。尽管当前基于指令的范例(例如OpenMP或OpenACC)同时支持加速器和基于多核的主机,但它们并不能提供一种有效且高效的方式来同时使用它们,通常会导致程序加速,从而导致主机的潜在计算性能下降。没有被利用。在本文中,我们提出了对基于OpenMP 4.5指令的编程模型的扩展,以支持规范和执行不同设备上任务区域的多个实例(即加速器以及多核处理器中的向量和高度多线程功能)。编译器负责为每种设备类型生成特定于设备的代码,从而将运行任务的动态调度委托给运行时系统给可用设备。新提议的条款传达了有用的见解,可在保持干净,抽象和独立于机器的程序员界面的同时指导调度程序。在OmpSs编译器和运行时基础结构的原型实现中分析了该建议的潜力。在基于ARM,Intel x86和IBM Power8的不同具有GPU功能的系统上,使用三个内核(N主体,图块矩阵乘法和Stream)完成了性能评估。通过评估,与仅使用GPU的版本相比,我们观察到的提速范围为8-20%,这归功于数据传输的减少和OmpSs NUMA-带来的好处,从而达到了96%的额外峰值性能。知道调度程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号