首页> 外文期刊>ACM transactions on computer systems >SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration
【24h】

SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration

机译:SKMD:多个设备上的单个内核可实现透明的CPU-GPU协作

获取原文
获取原文并翻译 | 示例

摘要

Heterogeneous computing on CPUs and GPUs has traditionally used fixed roles for each device: the GPU handles data parallel work by taking advantage of its massive number of cores while the CPU handles non data-parallel work, such as the sequential code or data transfer management. This work distribution can be a poor solution as it underutilizes the CPU, has difficulty generalizing beyond the single CPU-GPU combination, and may waste a large fraction of time transferring data. Further, CPUs are performance competitive with GPUs on many workloads, thus simply partitioning work based on the fixed roles may be a poor choice. In this article, we present the single-kernel multiple devices (SKMD) system, a framework that transparently orchestrates collaborative execution of a single data-parallel kernel across multiple asymmetric CPUs and GPUs. The programmer is responsible for developing a single data-parallel kernel in OpenCL, while the system automatically partitions the workload across an arbitrary set of devices, generates kernels to execute the partial workloads, and efficiently merges the partial outputs together. The goal is performance improvement by maximally utilizing all available resources to execute the kernel. SKMD handles the difficult challenges of exposed data transfer costs and the performance variations GPUs have with respect to input size. On real hardware, SKMD achieves an average speedup of 28% on a system with one multicore CPU and two asymmetric GPUs compared to a fastest device execution strategy for a set of popular OpenCL kernels.
机译:传统上,CPU和GPU上的异构计算为每个设备使用固定角色:GPU通过利用其大量的内核来处理数据并行工作,而CPU处理非数据并行工作,例如顺序代码或数据传输管理。这种工作分配可能是一个较差的解决方案,因为它未充分利用CPU,难以将其推广到单个CPU-GPU组合之外,并且可能浪费大量时间来传输数据。此外,在许多工作负载下,CPU在性能上均与GPU竞争,因此,仅基于固定角色对工作进行分区可能是一个糟糕的选择。在本文中,我们介绍了单内核多设备(SKMD)系统,该框架透明地协调了多个非对称CPU和GPU之间单个数据并行内核的协作执行。程序员负责在OpenCL中开发单个数据并行内核,而系统会自动在任意一组设备上划分工作负载,生成内核以执行部分​​工作负载,并有效地将部分输出合并在一起。目标是通过最大程度地利用所有可用资源来执行内核来提高性能。 SKMD应对数据传输成本高昂和GPU在输入大小方面的性能差异带来的艰巨挑战。在具有一个多核CPU和两个非对称GPU的系统上,与一组流行的OpenCL内核的最快设备执行策略相比,在实际硬件上,SKMD的平均速度提高了28%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号