cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs

【24h】

cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs

机译：cCUDA：GPU上并发内核的有效协同调度

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

While GPUs are meantime omnipresent for many scientific and technical computations, they still continue to evolve as processors. An important recent feature is the ability to execute multiple kernels concurrently via queue streams. However, experiments show that different parameters including the behavior of kernels, the order of kernel launches and other execution configurations, e.g., the number of concurrent thread blocks, may result in different execution time for concurrent kernel execution. Since kernels may have different resource requirements, they can be classified into different classes, which are traditionally assumed as either memory-bound or compute-bound. However, a kernel may belong to the different classes on different hardware according to the hardware resources. In this paper, the definition of kernel mix intensity is introduced. Based on this, a scheduling framework called concurrent CUDA (cCUDA) is proposed to co-schedule the concurrent kernels more efficiently. It first profiles and ranks kernels with different execution behaviors and then takes the kernel resource requirements into account to partition thread blocks of different kernels and overlap them to better utilize the GPU resources. Experimental results on real hardware demonstrate performance improvement in terms of execution time of up to 1.86x, and an average speedup of 1.28x for a wide range of kernels. cCUDA is available at https://github.com/kshekofteh/cCUDA.

机译：尽管GPU在许多科学技术计算中无处不在，但它们仍继续作为处理器发展。最近的一项重要功能是能够通过队列流同时执行多个内核。但是，实验表明，不同的参数（包括内核的行为，内核启动的顺序和其他执行配置）（例如，并发线程块的数量）可能会导致并发内核执行的执行时间不同。由于内核可能具有不同的资源要求，因此可以将它们分为不同的类，传统上将其假定为内存绑定或计算绑定。但是，根据硬件资源，内核可能属于不同硬件上的不同类。本文介绍了籽粒混合强度的定义。基于此，提出了一种称为并发CUDA（cCUDA）的调度框架，以更有效地共同调度并发内核。它首先对具有不同执行行为的内核进行概要分析和排名，然后考虑内核资源需求来划分不同内核的线程块，并将它们重叠以更好地利用GPU资源。实际硬件上的实验结果表明，在各种内核上，执行时间最多可提高1.86倍，平均速度可提高1.28倍。 cCUDA可从https://github.com/kshekofteh/cCUDA获得。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2020年第4期|766-778|共13页
作者

展开▼
作者单位

Ferdowsi Univ Mashhad Dept Comp Engn Mashhad 9177948974 Razavi Khorasan Iran;

Heidelberg Univ Inst Comp Engn D-69117 Heidelberg Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Kernel; Graphics processing units; Benchmark testing; Hardware; Scheduling; Analytical models; Kernel; scheduling; concurrent kernel execution; stream; resource management;

机译：核心;图形处理单元;基准测试;硬件;排程;分析模型;核心;排程并发内核执行;流;资源管理;

相似文献

外文文献
中文文献
专利

1. Using machine learning techniques to analyze the performance of concurrent kernel execution on GPUs [J] . Pablo Carvalho, Esteban Clua, Aline Paes, Future generation computer systems . 2020,第Deca期

机译：使用机器学习技术分析GPU上并发内核执行的性能
2. Scalable CAIM discretization on multiple GPUs using concurrent kernels [J] . Alberto Cano, Sebastian Ventura, Krzysztof J. Cios Journal of supercomputing . 2014,第1期

机译：使用并发内核在多个GPU上进行可扩展的CAIM离散化
3. Communication and computation optimization of concurrent kernels using kernel coalesce on a GPU [J] . B. Neelima, G. Ram Mohana Reddy, Prakash S. Raghavendra Concurrency and computation: practice and experience . 2015,第1期

机译：在GPU上使用内核合并对并发内核进行通信和计算优化
4. Algorithms for Preemptive Co-scheduling of Kernels on GPUs [C] . Lionel Eyraud-Dubois, Cristiana Bentes International Conference on High Performance Computing, Data, and Analytics . 2020

机译：GPU上籽粒籽粒的抢占协调算法
5. On implementation and optimization of large-data scientific kernels on multicore processors and GPUs [D] . Hakeem, Mohammad Umar 2013

机译：在多核处理器和GPU上实现和优化大数据科学内核
6. Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization [O] . A. Peter Ruymgaart, Ron Elber -1

机译：在CPU / GPU系统上重新定位分子动力学：水核和摇动并行化
7. Algorithms for Preemptive Co-scheduling of Kernels on GPUs [O] . Lionel Eyraud-Dubois, Cristiana Bentes 2020

机译：GPU上籽粒籽粒式共调度的算法
8. Linux Kernel Co-Scheduling for Bulk Synchronous Parallel Applications. [R] . Jones, T. 2013

机译：用于批量同步并行应用的Linux内核协同调度。

cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅