首页> 外文会议>International Conference on Parallel Architectures and Compilation Techniques >Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems
【24h】

Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems

机译:异构系统上数据并行内核的透明CPU-GPU协作

获取原文
获取外文期刊封面目录资料

摘要

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. Unfortunately, this work distribution can be a poor solution as it under utilizes the CPU, has difficulty generalizing beyond the single CPU-GPU combination, and may waste a large fraction of time transferring data. Further, CPUs are performance competitive with GPUs on many workloads, thus simply partitioning work based on the fixed roles may be a poor choice. In this paper, we present the single kernel multiple devices (SKMD) system, a framework that transparently orchestrates collaborative execution of a single data-parallel kernel across multiple asymmetric CPUs and GPUs. The programmer is responsible for developing a single data-parallel kernel in OpenCL, while the system automatically partitions the workload across an arbitrary set of devices, generates kernels to execute the partial workloads, and efficiently merges the partial outputs together. The goal is performance improvement by maximally utilizing all available resources to execute the kernel. SKMD handles the difficult challenges of exposed data transfer costs and the performance variations GPUs have with respect to input size. On real hardware, SKMD achieves an average speedup of 29% on a system with one multicore CPU and two asymmetric GPUs compared to a fastest device execution strategy for a set of popular OpenCL kernels.
机译:传统上,CPU和GPU上的异构计算为每个设备使用固定角色:GPU通过利用其大量内核来处理数据并行工作,而CPU处理非数据并行工作,例如顺序代码或数据传输管理。不幸的是,这种工作分配可能是一个不好的解决方案,因为它在利用CPU的情况下难以超越单一CPU-GPU组合进行推广,并且可能浪费大量时间来传输数据。此外,在许多工作负载下,CPU在性能上均与GPU竞争,因此仅基于固定角色对工作进行分区可能是一个糟糕的选择。在本文中,我们介绍了单内核多设备(SKMD)系统,该框架透明地协调了跨多个非对称CPU和GPU的单个数据并行内核的协同执行。程序员负责在OpenCL中开发单个数据并行内核,而系统会自动在任意一组设备上划分工作负载,生成内核以执行部分​​工作负载,并有效地将部分输出合并在一起。目标是通过最大程度地利用所有可用资源来执行内核来提高性能。 SKMD可以应对数据传输成本高昂和GPU在输入大小方面的性能差异带来的艰巨挑战。在具有一个多核CPU和两个非对称GPU的系统上,与一组流行的OpenCL内核的最快设备执行策略相比,在实际硬件上,SKMD的平均速度提高了29%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号