Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems

机译：异构系统上数据并行内核的透明CPU-GPU协作

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. Unfortunately, this work distribution can be a poor solution as it under utilizes the CPU, has difficulty generalizing beyond the single CPU-GPU combination, and may waste a large fraction of time transferring data. Further, CPUs are performance competitive with GPUs on many workloads, thus simply partitioning work based on the fixed roles may be a poor choice. In this paper, we present the single kernel multiple devices (SKMD) system, a framework that transparently orchestrates collaborative execution of a single data-parallel kernel across multiple asymmetric CPUs and GPUs. The programmer is responsible for developing a single data-parallel kernel in OpenCL, while the system automatically partitions the workload across an arbitrary set of devices, generates kernels to execute the partial workloads, and efficiently merges the partial outputs together. The goal is performance improvement by maximally utilizing all available resources to execute the kernel. SKMD handles the difficult challenges of exposed data transfer costs and the performance variations GPUs have with respect to input size. On real hardware, SKMD achieves an average speedup of 29% on a system with one multicore CPU and two asymmetric GPUs compared to a fastest device execution strategy for a set of popular OpenCL kernels.

机译：传统上，CPU和GPU上的异构计算为每个设备使用固定角色：GPU通过利用其大量内核来处理数据并行工作，而CPU处理非数据并行工作，例如顺序代码或数据传输管理。不幸的是，这种工作分配可能是一个不好的解决方案，因为它在利用CPU的情况下难以超越单一CPU-GPU组合进行推广，并且可能浪费大量时间来传输数据。此外，在许多工作负载下，CPU在性能上均与GPU竞争，因此仅基于固定角色对工作进行分区可能是一个糟糕的选择。在本文中，我们介绍了单内核多设备（SKMD）系统，该框架透明地协调了跨多个非对称CPU和GPU的单个数据并行内核的协同执行。程序员负责在OpenCL中开发单个数据并行内核，而系统会自动在任意一组设备上划分工作负载，生成内核以执行部分工作负载，并有效地将部分输出合并在一起。目标是通过最大程度地利用所有可用资源来执行内核来提高性能。 SKMD可以应对数据传输成本高昂和GPU在输入大小方面的性能差异带来的艰巨挑战。在具有一个多核CPU和两个非对称GPU的系统上，与一组流行的OpenCL内核的最快设备执行策略相比，在实际硬件上，SKMD的平均速度提高了29％。

著录项

来源
《International Conference on Parallel Architectures and Compilation Techniques》|2013年|245-255|共11页
会议地点
作者
Lee Janghaeng; Samadi Mehrzad; Park Yongjun; Mahlke Scott;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Collaboration; Data parallel; GPGPU; OpenCL;

机译：协作;数据并行; GPGPU; OpenCL;

相似文献

外文文献
中文文献
专利

1. SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration [J] . Lee Janghaeng, Samadi Mehrzad, Park Yongjun, ACM transactions on computer systems . 2015,第3期

机译：SKMD：多个设备上的单个内核可实现透明的CPU-GPU协作
2. Data-Parallel Kernels Speeding CPU-GPU Performance [J] . Suman Goyat, A.K. Soni International Journal of Applied Research on Information Technology and Computing . 2018,第1期

机译：数据并行内核加速CPU-GPU性能
3. clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters [J] . Raca Valon, Mehofer Eduard Journal of supercomputing . 2020,第12期

机译：ClusterCL：对异构不对称集群中的多内核数据并行应用的全面支持
4. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems [C] . Lee Janghaeng, Samadi Mehrzad, Park Yongjun, International Conference on Parallel Architectures and Compilation Techniques . 2013

机译：透明CPU-GPU在异构系统上的数据并行内核协作
5. Accelerating the discontinuous Galerkin cell-vertex scheme (DG-CVS) solver on CPU-GPU heterogeneous systems. [D] . Hu, Xiaoqi. 2017

机译：在CPU-GPU异构系统上加速不连续Galerkin单元顶点方案（DG-CVS）求解器。
6. Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems [O] . Kaibo Wang, Yin Huai, Rubao Lee, -1

机译：在CpU-GpU混合系统加速病理图像数据交叉对比
7. Efficient Inter-Device Task Scheduling Schemes for Multi-Device Co-Processing of Data-Parallel Kernels on Heterogeneous Systems [O] . Lanjun Wan, Weihua Zheng, Xinpan Yuan 2021

机译：有效的多均匀系统数据并行内核的多设备协同处理的高效互补机构
8. Transparent access to distributed, heterogeneous environmental information systems [R] . Brown, J. C. , Kissinger, B. A. 1994

机译：透明地访问分布式异构环境信息系统

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅