SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

Lee Janghaeng; Samadi Mehrzad; Park Yongjun; Mahlke Scott

首页> 外文期刊>ACM transactions on computer systems >SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

【24h】

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

机译：SKMD：多个设备上的单个内核可实现透明的CPU-GPU协作

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. This work distribution can be a poor solution as it underutilizes the CPU, has difficulty generalizing beyond the single CPU-GPU combination, and may waste a large fraction of time transferring data. Further, CPUs are performance competitive with GPUs on many workloads, thus simply partitioning work based on the fixed roles may be a poor choice. In this article, we present the single-kernel multiple devices (SKMD) system, a framework that transparently orchestrates collaborative execution of a single data-parallel kernel across multiple asymmetric CPUs and GPUs. The programmer is responsible for developing a single data-parallel kernel in OpenCL, while the system automatically partitions the workload across an arbitrary set of devices, generates kernels to execute the partial workloads, and efficiently merges the partial outputs together. The goal is performance improvement by maximally utilizing all available resources to execute the kernel. SKMD handles the difficult challenges of exposed data transfer costs and the performance variations GPUs have with respect to input size. On real hardware, SKMD achieves an average speedup of 28% on a system with one multicore CPU and two asymmetric GPUs compared to a fastest device execution strategy for a set of popular OpenCL kernels.

机译：传统上，CPU和GPU上的异构计算为每个设备使用固定角色：GPU通过利用其大量的内核来处理数据并行工作，而CPU处理非数据并行工作，例如顺序代码或数据传输管理。这种工作分配可能是一个较差的解决方案，因为它未充分利用CPU，难以将其推广到单个CPU-GPU组合之外，并且可能浪费大量时间来传输数据。此外，在许多工作负载下，CPU在性能上均与GPU竞争，因此，仅基于固定角色对工作进行分区可能是一个糟糕的选择。在本文中，我们介绍了单内核多设备（SKMD）系统，该框架透明地协调了多个非对称CPU和GPU之间单个数据并行内核的协作执行。程序员负责在OpenCL中开发单个数据并行内核，而系统会自动在任意一组设备上划分工作负载，生成内核以执行部分工作负载，并有效地将部分输出合并在一起。目标是通过最大程度地利用所有可用资源来执行内核来提高性能。 SKMD应对数据传输成本高昂和GPU在输入大小方面的性能差异带来的艰巨挑战。在具有一个多核CPU和两个非对称GPU的系统上，与一组流行的OpenCL内核的最快设备执行策略相比，在实际硬件上，SKMD的平均速度提高了28％。

著录项

来源
《ACM transactions on computer systems》 |2015年第3期|9.1-9.27|共27页
作者
Lee Janghaeng; Samadi Mehrzad; Park Yongjun; Mahlke Scott;
展开▼
作者单位

Univ Michigan, Comp Sci & Engn Dept, Ann Arbor, MI 48109 USA;

Univ Michigan, Comp Sci & Engn Dept, Ann Arbor, MI 48109 USA;

Hongik Univ, Dept Elect & Elect Engn, Seoul, South Korea;

Univ Michigan, Comp Sci & Engn Dept, Ann Arbor, MI 48109 USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Compiler; runtime; CPU; GPU; collaboration; optimization;

机译：编译器;运行时;CPU;GPU;协作;优化;

相似文献

外文文献
中文文献
专利

1. GaN/AIGaN multiple quantum wells grown on transparent and conductive (-201)-oriented β-Ga_2O_3 substrate for UV vertical light emitting devices [J] . Ajia I. A., Yamashita Y., Lorenz K., Applied Physics Letters . 2018,第8期

机译：在透明和导电（-201）取向的β-Ga_2O_3衬底上生长的GaN / AIGaN多量子阱，用于UV垂直发光器件
2. A collaborative CPU-GPU approach for deep learning on mobile devices [J] . Olivier Valery, Pangfeng Liu, Jan-JanWu Concurrency, practice and experience . 2019,第17期

机译：用于在移动设备上进行深度学习的协作式CPU-GPU方法
3. A collaborative CPU-GPU approach for deep learning on mobile devices [J] . Olivier Valery, Pangfeng Liu, Jan-JanWu Concurrency, practice and experience . 2019,第17期

机译：移动设备深度学习的协同CPU-GPU方法
4. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems [C] . Lee Janghaeng, Samadi Mehrzad, Park Yongjun, International Conference on Parallel Architectures and Compilation Techniques . 2013

机译：异构系统上数据并行内核的透明CPU-GPU协作
5. The distribution of opencl kernel execution across multiple devices. [D] . Gurfinkel, Steven. 2014

机译：opencl内核执行在多个设备上的分布。
6. Fine-Tuned Multilayered Transparent Electrode for Highly Transparent Perovskite Light-Emitting Devices [O] . Hua Wu, Prof. Yu Zhang, Dr. Xiaoyu Zhang, -1

机译：用于高度透明钙钛矿发光器件的微调多层透明电极
7. Technology ready use of single layer graphene as a transparent electrode for hybrid photovoltaic devices [O] . Wang, Zhibing, Puls, Conor P., Staley, Neal E., 2011

机译：技术随时使用单层石墨烯作为透明电极用于混合光伏器件

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅