Improving GPU performance via large warps and two-level warp scheduling

机译：通过大扭曲和两级扭曲调度来提高GPU性能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to their massive computational power, graphics processing units (GPUs) have become a popular platform for executing general purpose parallel applications. GPU programming models allow the programmer to create thousands of threads, each executing the same computing kernel. GPUs exploit this parallelism in two ways. First, threads are grouped into fixed-size SIMD batches known as warps, and second, many such warps are concurrently executed on a single GPU core. Despite these techniques, the computational resources on GPU cores are still underutilized, resulting in performance far short of what could be delivered. Two reasons for this are conditional branch instructions and stalls due to long latency operations. To improve GPU performance, computational resources must be more effectively utilized. To accomplish this, we propose two independent ideas: the large warp microarchitecture and two-level warp scheduling. We show that when combined, our mechanisms improve performance by 19.1% over traditional GPU cores for a wide variety of general purpose parallel applications that heretofore have not been able to fully exploit the available resources of the GPU chip.

机译：由于其强大的计算能力，图形处理单元（GPU）已成为执行通用并行应用程序的流行平台。 GPU编程模型允许程序员创建数千个线程，每个线程执行相同的计算内核。 GPU以两种方式利用这种并行性。首先，将线程分组为固定大小的SIMD批处理，称为“ warp”，然后，在单个GPU内核上同时执行许多此类warp。尽管有这些技术，但GPU内核上的计算资源仍未得到充分利用，从而导致性能远远低于可交付的性能。造成这种情况的两个原因是条件分支指令和由于长时间等待操作而导致的停顿。为了提高GPU性能，必须更有效地利用计算资源。为此，我们提出了两个独立的想法：大型翘曲微体系结构和两级翘曲调度。我们证明，结合使用这些机制，我们的机制相对于传统的GPU内核，对于迄今为止无法充分利用GPU芯片的可用资源的各种通用并行应用程序，其性能提高了19.1％。

著录项

来源
《IEEE/ACM International Symposium on Microarchitecture》|2011年|308-317|共10页
会议地点
作者
Veynu Narasiman; Michael Shebanow; Chang Joo Lee; Rustam Miftakhutdinov; Onur Mutlu; Yale N. Patt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Graphics processing units; Instruction sets; Processor scheduling; Pipelines; Registers; Benchmark testing; Scheduling;

机译：图形处理单元;指令集;处理器调度;管道;寄存器;基准测试;调度;

相似文献

外文文献
中文文献
专利

1. Improving branch divergence performance on GPGPU with a new PDOM stack and multi-level warp scheduling [J] . Licheng Yu, Xingsheng Tang, Minghui Wu, Journal of systems architecture . 2014,第5期

机译：通过新的PDOM堆栈和多级翘曲调度提高GPGPU上的分支发散性能
2. A novel warp scheduling scheme considering long-latency operations for high-performance GPUs [J] . Cong Thuan Do, Choi Hong Jun, Chung Sung Woo, Journal of supercomputing . 2020,第4期

机译：考虑高性能GPU的长期操作的新型扭曲调度方案
3. CAWA: Coordinated Warp Scheduling and Cache Prioritization for Critical Warp Acceleration of GPGPU Workloads [J] . Shin-Ying Lee, Akhil Arunkumar, Carole-Jean Wu Computer architecture news . 2015,第3期

机译：CAWA：协调的翘曲调度和缓存优先级，用于GPGPU工作负载的关键翘曲加速
4. Improving GPU performance via large warps and two-level warp scheduling [C] . Veynu Narasiman, Michael Shebanow, Chang Joo Lee, IEEE/ACM International Symposium on Microarchitecture . 2011

机译：通过大扭曲和两级扭曲调度提高GPU性能
5. The Development of WARP - A Framework for Continuous Energy Monte Carlo Neutron Transport in General 3D Geometries on GPUs. [D] . Bergmann, Ryan. 2014

机译：WARP的开发-GPU上一般3D几何形状中连续能量蒙特卡洛中子传输的框架。
6. Preparation and Performances of Warp-Knitted Hernia Repair Mesh Fabricated with Chitosan Fiber [O] . Shuang Yu, Pibo Ma, Honglian Cong, 2019

机译：壳聚糖纤维经编疝气修补网的制备及性能
7. Warp-Aware Trace Scheduling for GPUs [O] . James A. Jablin, Thomas B. Jablin, Onur Mutlu, 2014

机译：用于GpU的Warp-aware跟踪调度

Improving GPU performance via large warps and two-level warp scheduling

摘要

著录项

相似文献

相关主题

期刊订阅