首页> 外文会议>International Conference on High Performance Computing Simulation >OpenCL Performance Prediction using Architecture-Independent Features
【24h】

OpenCL Performance Prediction using Architecture-Independent Features

机译:使用与体系结构无关的功能进行OpenCL性能预测

获取原文

摘要

OpenCL is an attractive programming model for heterogeneous high-performance computing systems, with wide support from hardware vendors and significant performance portability. To support efficient scheduling on HPC systems it is necessary to perform accurate performance predictions for OpenCL workloads on varied compute devices, which is challenging due to diverse computation, communication and memory access characteristics which result in varying performance between devices. The Architecture Independent Workload Characterization (AIWC) tool can be used to characterize OpenCL kernels according to a set of architecture-independent features. This work presents a methodology where AIWC features are used to form a model capable of predicting accelerator execution times. We used this methodology to predict execution times for a set of 37 computational kernels running on 15 different devices representing a broad range of CPU, GPU and MIC architectures. The predictions are highly accurate, differing from the measured experimental run-times by an average of only 1.2%, and correspond to actual execution time mispredictions of 9 ps to 1 sec according to problem size. A previously unencountered code can be instrumented once and the AIWC metrics embedded in the kernel, to allow performance prediction across the full range of modelled devices. The results suggest that this methodology supports correct selection of the most appropriate device for a previously unen- countered code, which is highly relevant to the HPC scheduling setting.
机译:OpenCL是一种用于异构高性能计算系统的有吸引力的编程模型,它得到了硬件供应商的广泛支持和出色的性能可移植性。为了在HPC系统上支持高效的调度,有必要对各种计算设备上的OpenCL工作负载执行准确的性能预测,这由于具有各种计算,通信和内存访问特性(导致设备之间的性能变化)而具有挑战性。可以使用体系结构独立工作量表征(AIWC)工具根据一组与体系结构无关的功能来表征OpenCL内核。这项工作提出了一种方法,其中AIWC功能用于形成能够预测加速器执行时间的模型。我们使用这种方法来预测一组运行在15种不同设备上的37个计算内核的执行时间,这些设备代表了广泛的CPU,GPU和MIC架构。这些预测是高度准确的,与实测实验运行时间的平均差异仅为1.2%,并且根据问题的大小,对应于9 ps至1 sec的实际执行时间错误预测。以前无法遇到的代码可以被检测一次,并且AIWC度量标准可以嵌入到内核中,从而可以在整个建模设备范围内进行性能预测。结果表明,该方法支持针对先前未使用的代码正确选择最合适的设备,这与HPC调度设置高度相关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号