Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking

机译：支持空间多任务处理的GPU的可识别流程变化的工作负载分区算法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

High-level programming languages have transformed graphics processing units (GPUs) from domain-restricted devices into powerful compute platforms. Yet many “generalpurpose GPU” (GPGPU) applications fail to fully utilize the GPU resources. Executing multiple applications simultaneously on different regions of the GPU (spatial multitasking) thus improves system performance. However, within-die process variations lead to significantly different maximum operating frequencies (Fmax) of the streaming multiprocessors (SMs) within a GPU. As the chip size and number of SMs per chip increase, the frequency variation is also expected to increase, exacerbating the problem. The increased number of SMs also provides a unique opportunity: we can allocate resources to concurrently-executing applications based on how those applications are affected by the different available Fmax values. In this paper, we study the effects of per-SM clocking on spatial multitasking-capable GPUs. We demonstrate two factors that affect the performance of simultaneously-running applications: (i) the SM partitioning algorithm that decides how many resources to assign to each application, and (ii) the assignment of SMs to applications based on the operating frequencies of those SMs and the applications characteristics. Our experimental results show that spatial multitasking that partitions SMs based on application characteristics, when combined with per-SM clocking, can greatly improve application performance by up to 46% on average compared to cooperative multitasking with global clocking.

机译：高级编程语言已经将图形处理单元（GPU）从域受限的设备转变为功能强大的计算平台。然而，许多“通用GPU”（GPGPU）应用程序无法充分利用GPU资源。因此，可以在GPU的不同区域上同时执行多个应用程序（空间多任务处理），从而提高系统性能。但是，芯片内工艺变化会导致GPU中的流式多处理器（SM）的最大工作频率（Fmax）明显不同。随着芯片尺寸和每个芯片SM数量的增加，频率变化也有望增加，从而加剧了该问题。 SM数量的增加也提供了独特的机会：我们可以根据不同可用Fmax值对这些应用程序的影响方式，将资源分配给同时执行的应用程序。在本文中，我们研究了每SM时钟对具有空间多任务功能的GPU的影响。我们演示了两个因素，这些因素会影响同时运行的应用程序的性能：（i）SM分区算法，该算法决定为每个应用程序分配多少资源，以及（ii）根据那些SM的工作频率将SM分配给应用程序以及应用程序特征。我们的实验结果表明，与每个SM时钟结合使用时，基于应用程序特征对SM进行分区的空间多任务处理与具有全局时钟的协作式多任务处理相比，平均可将应用程序性能平均提高多达46％。

著录项

来源
《Design, Automation & Test in Europe Conference and Exhibition》|2014年|1-6|共6页
会议地点 Dresden(DE)
作者
Aguilera, Paula; Lee, Jungseob; Farmahini-Farahani, Amin; Morrow, Katherine;
展开▼
作者单位

University of Wisconsin - Madison|c|;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. AMD GPUs as an Alternative to NVIDIA for Supporting Real-Time Workloads [J] . Nathan Otterness, James H. Anderson LIPIcs : Leibniz International Proceedings in Informatics . 2020,第30期

机译：AMD GPU作为NVIDIA支持实时工作负载的替代品
2. Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms [J] . HajiRassouliha Amir, Taberner Andrew J., Nash Martyn P., Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2018,第期

机译：适用于电脑视觉和图像处理算法的最近硬件加速器（DSP，FPGA和GPU）的适用性
3. Design and Performance Evaluation of Image Processing Algorithms on GPUs [J] . Park In Kyu, Singhal Nitin, Lee Man Hee, Parallel and Distributed Systems, IEEE Transactions on . 2011,第1期

机译：GPU上图像处理算法的设计和性能评估
4. Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking [C] . Aguilera Paula, Lee Jungseob, Farmahini-Farahani Amin, Design, Automation Test in Europe Conference and Exhibition . 2014

机译：用于支持空间 - 多任务处理的GPU的过程变体识别工作负载分区算法
5. Algorithmic and software system support to accelerate data processing in CPU-GPU hybrid computing environments. [D] . Wang, Kaibo. 2015

机译：算法和软件系统支持可加速CPU-GPU混合计算环境中的数据处理。
6. Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control Acquisition Processing and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF) [O] . S.N. Swetadri Vasan, Ciprian N. Ionita, A.H. Titus, -1

机译：图形处理单元（GpU）执行的图像处理算法以改善控制采集处理的系统的性能以及微造影荧光镜的图像显示系统（CapIDs）（maF）
7. Workload Partitioning Algorithm Based on Performance Curve of GPU in Heterogeneous Platforms [O] . Hongyu Yang, Hui Chen, Chengming Li, 2018

机译：基于GPU在异构平台上的工作负荷分区算法

Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅