首页> 外文会议>Design, Automation & Test in Europe Conference and Exhibition >Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking
【24h】

Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking

机译:支持空间多任务处理的GPU的可识别流程变化的工作负载分区算法

获取原文

摘要

High-level programming languages have transformed graphics processing units (GPUs) from domain-restricted devices into powerful compute platforms. Yet many “generalpurpose GPU” (GPGPU) applications fail to fully utilize the GPU resources. Executing multiple applications simultaneously on different regions of the GPU (spatial multitasking) thus improves system performance. However, within-die process variations lead to significantly different maximum operating frequencies (Fmax) of the streaming multiprocessors (SMs) within a GPU. As the chip size and number of SMs per chip increase, the frequency variation is also expected to increase, exacerbating the problem. The increased number of SMs also provides a unique opportunity: we can allocate resources to concurrently-executing applications based on how those applications are affected by the different available Fmax values. In this paper, we study the effects of per-SM clocking on spatial multitasking-capable GPUs. We demonstrate two factors that affect the performance of simultaneously-running applications: (i) the SM partitioning algorithm that decides how many resources to assign to each application, and (ii) the assignment of SMs to applications based on the operating frequencies of those SMs and the applications characteristics. Our experimental results show that spatial multitasking that partitions SMs based on application characteristics, when combined with per-SM clocking, can greatly improve application performance by up to 46% on average compared to cooperative multitasking with global clocking.
机译:高级编程语言已经将图形处理单元(GPU)从域受限的设备转变为功能强大的计算平台。然而,许多“通用GPU”(GPGPU)应用程序无法充分利用GPU资源。因此,可以在GPU的不同区域上同时执行多个应用程序(空间多任务处理),从而提高系统性能。但是,芯片内工艺变化会导致GPU中的流式多处理器(SM)的最大工作频率(Fmax)明显不同。随着芯片尺寸和每个芯片SM数量的增加,频率变化也有望增加,从而加剧了该问题。 SM数量的增加也提供了独特的机会:我们可以根据不同可用Fmax值对这些应用程序的影响方式,将资源分配给同时执行的应用程序。在本文中,我们研究了每SM时钟对具有空间多任务功能的GPU的影响。我们演示了两个因素,这些因素会影响同时运行的应用程序的性能:(i)SM分区算法,该算法决定为每个应用程序分配多少资源,以及(ii)根据那些SM的工作频率将SM分配给应用程序以及应用程序特征。我们的实验结果表明,与每个SM时钟结合使用时,基于应用程序特征对SM进行分区的空间多任务处理与具有全局时钟的协作式多任务处理相比,平均可将应用程序性能平均提高多达46%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号