Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming

Qiumin Xu; Hyeran Jeon; Keunsoo Kim; Won Woo Ro; Murali Annavaram

首页> 外文期刊>Computer architecture news >Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming

【24h】

Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming

机译：Warped-Slicer：通过动态资源划分对GPU多程序进行有效的SM内切片

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As technology scales, GPUs are forecasted to incorporate an ever-increasing amount of computing resources to support thread-level parallelism. But even with the best effort, exposing massive thread-level parallelism from a single GPU kernel, particularly from general purpose applications, is going to be a difficult challenge. In some cases, even if there is sufficient thread-level parallelism in a kernel, there may not be enough available memory bandwidth to support such massive concurrent thread execution. Hence, GPU resources may be underutilized as more general purpose applications are ported to execute on GPUs. In this paper, we explore multiprogramming GPUs as a way to resolve the resource underutilization issue. There is a growing hardware support for multiprogramming on GPUs. Hyper-Q has been introduced in the Kepler architecture which enables multiple kernels to be invoked via tens of hardware queue streams. Spatial multitasking has been proposed to partition GPU resources across multiple kernels. But the partitioning is done at the coarse granularity of streaming multiprocessors (SMs) where each kernel is assigned to a subset of SMs. In this paper, we advocate for partitioning a single SM across multiple kernels, which we term as intra-SM slicing. We explore various intra-SM slicing strategies that slice resources within each SM to concurrently run multiple kernels on the SM. Our results show that there is not one intra-SM slicing strategy that derives the best performance for all application pairs. We propose Warped-Slicer, a dynamic intra-SM slicing strategy that uses an analytical method for calculating the SM resource partitioning across different kernels that maximizes performance. The model relies on a set of short online profile runs to determine how each kernel's performance varies as more thread blocks from each kernel are assigned to an SM. The model takes into account the interference effect of shared resource usage across multiple kernels. The model is also computationally efficient and can determine the resource partitioning quickly to enable dynamic decision making as new kernels enter the system. We demonstrate that the proposed Warped-Slicer approach improves performance by 23% over the baseline multiprogramming approach with minimal hardware overhead.

机译：随着技术的发展，预计GPU将包含越来越多的计算资源以支持线程级并行性。但是，即使尽了最大的努力，从单个GPU内核（尤其是从通用应用程序）中暴露大量线程级并行度仍将是一个艰巨的挑战。在某些情况下，即使内核中有足够的线程级并行性，也可能没有足够的可用内存带宽来支持如此大量的并发线程执行。因此，由于移植了更多通用应用程序以在GPU上执行，因此GPU资源可能未被充分利用。在本文中，我们探索了多编程GPU作为解决资源利用不足问题的一种方法。越来越多的硬件支持在GPU上进行多重编程。 Hyper-Q已在Kepler体系结构中引入，该体系结构允许通过数十个硬件队列流调用多个内核。已经提出了空间多任务处理以在多个内核之间分配GPU资源。但是分区是在流式多处理器（SM）的粗粒度下完成的，其中每个内核都分配给SM的子集。在本文中，我们主张在多个内核之间划分单个SM，我们将其称为SM内切片。我们探索了各种SM内部切片策略，这些策略在每个SM中对资源进行切片，以在SM上同时运行多个内核。我们的结果表明，没有一种SM内切片策略可以为所有应用程序对带来最佳性能。我们提出了Warped-Slicer，这是一种动态的内部SM切片策略，该策略使用一种分析方法来计算跨不同内核的SM资源分区，从而使性能最大化。该模型依靠一组简短的联机配置文件来确定随着将每个内核中的更多线程块分配给SM时每个内核的性能如何变化。该模型考虑了跨多个内核使用共享资源的干扰影响。该模型在计算上也很有效，并且可以快速确定资源分区，以便在新内核进入系统时进行动态决策。我们证明了所提出的Warped-Slicer方法比基线多编程方法以最少的硬件开销将性能提高了23％。

著录项

来源
《Computer architecture news》 |2016年第3期|230-242|共13页
作者
Qiumin Xu; Hyeran Jeon; Keunsoo Kim; Won Woo Ro; Murali Annavaram;
展开▼
作者单位

Ming Hsieh Department of Electrical Engineering, University of Southern California;

Department of Computer Engineering, San Jose State University;

School of Electrical and Electronic Engineering, Yonsei University;

School of Electrical and Electronic Engineering, Yonsei University;

Ming Hsieh Department of Electrical Engineering, University of Southern California;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPUs; scheduling; multiprogramming; multi-kernel; resource management;

机译：GPU;排程多程序;多内核资源管理;

相似文献

外文文献
中文文献
专利

1. Dynamic Resource Management for Efficient Utilization of Multitasking GPUs [J] . Jason Jong Kyu Park, Yongjun Park, Scott Mahlke Computer architecture news . 2017,第1期

机译：动态资源管理可有效利用多任务GPU
2. Dynamic Resource Management for Efficient Utilization of Multitasking GPUs [J] . Park Jason Jong Kyu, Park Yongjun, Mahlke Scott ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2017,第4期

机译：多任务GPU有效利用动态资源管理
3. Pursuing Coordinated Trajectory Progression and Efficient Resource Utilization of GPU-Enabled Molecular Dynamics Simulations [J] . Design & Test,IEEE . 2014,第1期

机译：追求协调的轨迹进展和GPU支持的分子动力学模拟的有效资源利用
4. Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming [C] . Qiumin Xu, Hyeran Jeon, Keunsoo Kim, ACM/IEEE Annual International Symposium on Computer Architecture . 2016

机译：Warped-Slicer：通过动态资源划分对GPU多程序进行有效的SM内切片
5. Architectural Support for Efficient GPU Multiprogramming [D] . Lin, Zhen. 2019

机译：高效GPU多程序的架构支持
6. Accelerating calculations of RNA secondary structure partition functions using GPUs [O] . Harry A Stern, David H Mathews 2013

机译：使用GPU加速RNA二级结构分区功能的计算
7. Dynamic Resource Management for Efficient Utilization of Multitasking GPUs [O] . Jason Jong Kyu Park, Yongjun Park, Scott Mahlke 2017

机译：多任务GPU有效利用的动态资源管理

Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming

摘要

著录项

相似文献

相关主题

期刊订阅