首页> 外文会议>International Workshop on OpenMP >The Secrets of the Accelerators Unveiled: Tracing Heterogeneous Executions Through OMPT
【24h】

The Secrets of the Accelerators Unveiled: Tracing Heterogeneous Executions Through OMPT

机译:揭开加速器的秘密:通过OMPT追踪异类执行

获取原文

摘要

Heterogeneous systems are an important trend in the future of supercomputers, yet they can be hard to program and developers still lack powerful tools to gain understanding about how well their accelerated codes perform and how to improve them. Having different types of hardware accelerators available, each with their own specific low-level APIs to program them, there is not yet a clear consensus on a standard way to retrieve information about the accelerator's performance. To improve this scenario, OMPT is a novel performance monitoring interface that is being considered for integration into the OpenMP standard. OMPT allows analysis tools to monitor the execution of parallel OpenMP applications by providing detailed information about the activity of the runtime through a standard API. For accelerated devices, OMPT also facilitates the exchange of performance information between the runtime and the analysis tool. We implement part of the OMPT specification that refers to the use of accelerators both in the Nanos++ parallel runtime system and the Extrae tracing framework, obtaining detailed performance information about the execution of the tasks issued to the accelerated devices to later conduct insightful analysis. Our work extends previous efforts in the field to expose detailed information from the OpenMP and OmpSs runtimes, regarding the activity and performance of task-based parallel applications. In this paper, we focus on the evaluation of FPGA devices studying the performance of two common kernels in scientific algorithms: matrix multiplication and Cholesky decomposition. Furthermore, this development is seamlessly applicable for the analysis of GPGPU accelerators and Intel® Xeon Phi™ co-processors operating under the OmpSs programming model.
机译:异构系统是超级计算机未来的重要趋势,但是它们可能很难编程,并且开发人员仍然缺乏强大的工具来了解其加速代码的性能以及如何对其进行改进。由于有各种类型的硬件加速器可用,每种都有自己的特定低级API对其进行编程,因此对于检索有关加速器性能信息的标准方法尚无明确共识。为了改善这种情况,OMPT是一种新颖的性能监视接口,正在考虑将其集成到OpenMP标准中。 OMPT通过标准API提供有关运行时活动的详细信息,允许分析工具监视并行OpenMP应用程序的执行。对于加速设备,OMPT还有助于在运行时和分析工具之间交换性能信息。我们实现了OMPT规范的一部分,该规范涉及在Nanos ++并行运行时系统和Extrae跟踪框架中使用加速器,从而获得了有关执行给加速设备的任务执行的详细性能信息,以便以后进行深入的分析。我们的工作扩展了该领域的先前工作,以揭示来自OpenMP和OmpSs运行时的详细信息,这些信息与基于任务的并行应用程序的活动和性能有关。在本文中,我们专注于FPGA器件的评估,研究科学算法中两个常见内核的性能:矩阵乘法和Cholesky分解。此外,此开发可无缝应用于分析在OmpSs编程模型下运行的GPGPU加速器和英特尔®至强融核™协处理器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号