首页> 外文会议>International Conference on High Performance Computing and Simulation >OpenCL Performance Prediction using Architecture-Independent Features
【24h】

OpenCL Performance Prediction using Architecture-Independent Features

机译:使用架构无关的功能的OpenCL性能预测

获取原文

摘要

OpenCL is an attractive programming model for heterogeneous high-performance computing systems, with wide support from hardware vendors and significant performance portability. To support efficient scheduling on HPC systems it is necessary to perform accurate performance predictions for OpenCL workloads on varied compute devices, which is challenging due to diverse computation, communication and memory access characteristics which result in varying performance between devices. The Architecture Independent Workload Characterization (AIWC) tool can be used to characterize OpenCL kernels according to a set of architecture-independent features. This work presents a methodology where AIWC features are used to form a model capable of predicting accelerator execution times. We used this methodology to predict execution times for a set of 37 computational kernels running on 15 different devices representing a broad range of CPU, GPU and MIC architectures. The predictions are highly accurate, differing from the measured experimental run-times by an average of only 1.2%, and correspond to actual execution time mispredictions of 9 ps to 1 sec according to problem size. A previously unencountered code can be instrumented once and the AIWC metrics embedded in the kernel, to allow performance prediction across the full range of modelled devices. The results suggest that this methodology supports correct selection of the most appropriate device for a previously unen- countered code, which is highly relevant to the HPC scheduling setting.
机译:OpenCL是一个有吸引力的非均质高性能计算系统的编程模型,包括硬件供应商的广泛支持以及显着的性能便携性。为了支持高效调度HPC系统,必须对不同计算设备上的OpenCL工作负载进行准确的性能预测,这是由于设备之间具有不同性能的不同计算,通信和存储器访问特性而具有具有挑战性的。体系结构独立的工作负载表征(AIWC)工具可用于根据一组独立的功能来表征OpenCL内核。该工作呈现了一种方法,其中AIWC功能用于形成能够预测加速器执行时间的模型。我们使用该方法来预测运行的15个不同设备上的一组37个计算内核的执行时间,代表广泛的CPU,GPU和麦克风架构。预测高度准确,与测量的实验运行相差,平均仅为1.2%,并且根据问题大小对应于9 ps至1 sec的实际执行时间错误。先前未经识别的代码可以介绍一次,并且嵌入在内核中的AIWC度量标准,以允许跨所有建模设备的性能预测。结果表明,该方法支持正确选择最合适的设备,以获得先前未置换的代码,这与HPC调度设置非常相关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号