首页> 外文期刊>Computing >FusionCL: a machine-learning based approach for OpenCL kernel fusion to increase system performance
【24h】

FusionCL: a machine-learning based approach for OpenCL kernel fusion to increase system performance

机译:FusionCL:一种基于机器学习的OpenCL内核融合方法,提高系统性能

获取原文
获取原文并翻译 | 示例
       

摘要

Employing general-purpose graphics processing units (GPGPU) with the help of OpenCL has resulted in greatly reducing the execution time of data-parallel applications by taking advantage of the massive available parallelism. However, when a small data size application is executed on GPU there is a wastage of GPU resources as the application cannot fully utilize GPU compute-cores. There is no mechanism to share a GPU between two kernels due to the lack of operating system support on GPU. In this paper, we propose the provision of a GPU sharing mechanism between two kernels that will lead to increasing GPU occupancy, and as a result, reduce execution time of a job pool. However, if a pair of the kernel is competing for the same set of resources (i.e., both applications are compute-intensive or memory-intensive), kernel fusion may also result in a significant increase in execution time of fused kernels. Therefore, it is pertinent to select an optimal pair of kernels for fusion that will result in significant speedup over their serial execution. This research presents FusionCL, a machine learning-based GPU sharing mechanism between a pair of OpenCL kernels. FusionCL identifies each pair of kernels (from the job pool), which are suitable candidates for fusion using a machine learning-based fusion suitability classifier. Thereafter, from all the candidates, it selects a pair of candidate kernels that will produce maximum speedup after fusion over their serial execution using a fusion speedup predictor. The experimental evaluation shows that the proposed kernel fusion mechanism reduces execution time by 2.83x when compared to a baseline scheduling scheme. When compared to state-of-the-art, the reduction in execution time is up to 8%.
机译:通过利用大量可用的并行性,采用OpenCL的帮助的通用图形处理单元(GPGPU)大大地减少了数据并行应用的执行时间。但是,当在GPU上执行小数据大小应用程序时,由于应用程序无法充分利用GPU计算核,存在GPU资源的浪费。由于GPU上缺乏操作系统支持,没有机制在两个内核之间共享GPU。在本文中,我们提出提供两个内核之间的GPU共享机制,这将导致GPU占用,因此减少了作业池的执行时间。但是,如果一对内核正在竞争相同的资源集(即,这两个应用程序都是计算密集型或内存密集型的),则内核融合也可能导致融合内核的执行时间显着增加。因此,它有关选择最佳的融合对核心,这将导致其串行执行显着加速。本研究提出了FusionCL,这是一对OpenCL内核之间的基于机器学习的GPU共享机制。 FusionCL标识每对核(来自作业池),这是使用基于机器学习的融合适用性分类器的融合的合适候选者。此后,从所有候选者中,它选择一对候选内核,它将使用融合超速预测器融合后融合后产生最大加速。实验评估表明,与基线调度方案相比,所提出的核融合机制在2.83倍下减少了执行时间。与最先进的相比,执行时间的减少高达8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号