FusionCL: a machine-learning based approach for OpenCL kernel fusion to increase system performance

Khalid Yasir Noman; Aleem Muhammad; Ahmed Usman; Prodan Radu; Islam Muhammad Arshad; Iqbal Muhammad Azhar

首页> 外文期刊>Computing >FusionCL: a machine-learning based approach for OpenCL kernel fusion to increase system performance

【24h】

FusionCL: a machine-learning based approach for OpenCL kernel fusion to increase system performance

机译：FusionCL：一种基于机器学习的OpenCL内核融合方法，提高系统性能

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Employing general-purpose graphics processing units (GPGPU) with the help of OpenCL has resulted in greatly reducing the execution time of data-parallel applications by taking advantage of the massive available parallelism. However, when a small data size application is executed on GPU there is a wastage of GPU resources as the application cannot fully utilize GPU compute-cores. There is no mechanism to share a GPU between two kernels due to the lack of operating system support on GPU. In this paper, we propose the provision of a GPU sharing mechanism between two kernels that will lead to increasing GPU occupancy, and as a result, reduce execution time of a job pool. However, if a pair of the kernel is competing for the same set of resources (i.e., both applications are compute-intensive or memory-intensive), kernel fusion may also result in a significant increase in execution time of fused kernels. Therefore, it is pertinent to select an optimal pair of kernels for fusion that will result in significant speedup over their serial execution. This research presents FusionCL, a machine learning-based GPU sharing mechanism between a pair of OpenCL kernels. FusionCL identifies each pair of kernels (from the job pool), which are suitable candidates for fusion using a machine learning-based fusion suitability classifier. Thereafter, from all the candidates, it selects a pair of candidate kernels that will produce maximum speedup after fusion over their serial execution using a fusion speedup predictor. The experimental evaluation shows that the proposed kernel fusion mechanism reduces execution time by 2.83x when compared to a baseline scheduling scheme. When compared to state-of-the-art, the reduction in execution time is up to 8%.

机译：通过利用大量可用的并行性，采用OpenCL的帮助的通用图形处理单元（GPGPU）大大地减少了数据并行应用的执行时间。但是，当在GPU上执行小数据大小应用程序时，由于应用程序无法充分利用GPU计算核，存在GPU资源的浪费。由于GPU上缺乏操作系统支持，没有机制在两个内核之间共享GPU。在本文中，我们提出提供两个内核之间的GPU共享机制，这将导致GPU占用，因此减少了作业池的执行时间。但是，如果一对内核正在竞争相同的资源集（即，这两个应用程序都是计算密集型或内存密集型的），则内核融合也可能导致融合内核的执行时间显着增加。因此，它有关选择最佳的融合对核心，这将导致其串行执行显着加速。本研究提出了FusionCL，这是一对OpenCL内核之间的基于机器学习的GPU共享机制。 FusionCL标识每对核（来自作业池），这是使用基于机器学习的融合适用性分类器的融合的合适候选者。此后，从所有候选者中，它选择一对候选内核，它将使用融合超速预测器融合后融合后产生最大加速。实验评估表明，与基线调度方案相比，所提出的核融合机制在2.83倍下减少了执行时间。与最先进的相比，执行时间的减少高达8％。

著录项

来源
《Computing》 |2021年第10期|2171-2202|共32页
作者
Khalid Yasir Noman; Aleem Muhammad; Ahmed Usman; Prodan Radu; Islam Muhammad Arshad; Iqbal Muhammad Azhar;
展开▼
作者单位

HITEC Univ Taxila 47080 Pakistan;

Natl Univ Comp & Emerging Sci Islamabad 44000 Pakistan;

Western Norway Univ Appl Sci N-5004 Bergen Norway;

Alpen Adria Univ Klagenfurt A-9020 Klagenfurt Austria;

Natl Univ Comp & Emerging Sci Islamabad 44000 Pakistan;

Southwest Jiaotong Univ Sch Comp & Artificial Intelligence Chengdu 611756 Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Scheduling; Kernel fusion; High-performance computing; Machine learning;

机译：调度;内核融合;高性能计算;机器学习;

相似文献

外文文献
中文文献
专利

1. Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations [J] . Mei?Wen, Da-fei?Huang, Chang-qing?Xun, Frontiers of Information Technology & Electronic Engineering . 2015,第11期

机译：通过基于分析的转换，提高多核/多核CPU上特定于GPU的OpenCL内核的性能可移植性
2. Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by analysis-based transformations*# [J] . Mei WEN, Da-fei HUANG, Chang-qing XUN, 浙江大学学报（英文版）（C辑：计算机与电子） . 2015,第011期

机译：通过基于分析的转换来提高多核/多核CPU上特定于GPU的OpenCL内核的性能可移植性*＃
3. A machine-learning approach for structural damage detection using least square support vector machine based on a new combinational kernel function [J] . Ghiasi Ramin, Torkzadeh Peyman, Noori Mohammad Structural health monitoring . 2016,第3期

机译：基于新组合核函数的最小二乘支持向量机用于结构损伤检测的机器学习方法
4. Increasing performance of parallel and distributed systems in high performance computing using weight based approach [C] . Jothi /I/. Arul, Indumathy P. International Conference on Circuit, Power and Computing Technologies . 2015

机译：使用基于权重的方法在高性能计算中提高并行和分布式系统的性能
5. Kernel service outsourcing: An approach to improve performance and reliability of virtualized systems. [D] . Koh, Younggyun. 2010

机译：内核服务外包：一种提高虚拟系统性能和可靠性的方法。
6. In silico prediction of potential miRNA‐disease association using an integrative bioinformatics approach based on kernel fusion [O] . Na‐Na Guan, Chun‐Chun Wang, Li Zhang, 2020

机译：使用基于核融合的综合生物信息学方法进行计算机模拟潜在的miRNA-疾病关联
7. Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels [O] . Jianbin Fang, Henk Sips, Ana Lucia Varbanescu 2015

机译：Grover：通过在OpenCL内核中禁用本地内存使用来寻求性能改进

FusionCL: a machine-learning based approach for OpenCL kernel fusion to increase system performance

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅