Orchestrated Scheduling and Prefetching for GPGPUs

Adwait Jog; Onur Kayiran; Asit K. Mishra; Mahmut T. Kandemir; Onur Mutlu; Ravishankar lyer; Chita R. Das

首页> 外文期刊>Computer architecture news >Orchestrated Scheduling and Prefetching for GPGPUs

【24h】

Orchestrated Scheduling and Prefetching for GPGPUs

机译：GPGPU的调度计划和预取

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we present techniques that coordinate the thread scheduling and prefetching decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better tolerate long memory latencies. We demonstrate that existing warp scheduling policies in GPGPU architectures are unable to effectively incorporate data prefetching. The main reason is that they schedule consecutive warps, which are likely to access nearby cache blocks and thus prefetch accurately for one another, back-to-back in consecutive cycles. This either 1) causes prefetches to be generated by a warp too close to the time their corresponding addresses are actually demanded by another warp, or 2) requires sophisticated prefetcher designs to correctly predict the addresses required by a future "far-ahead" warp while executing the current warp. We propose a new prefetch-aw are warp scheduling policy that overcomes these problems. The key idea is to separate in time the scheduling of consecutive warps such that they are not executed back-to-back. We show that this policy not only enables a simple prefetcher to be effective in tolerating memory latencies but also improves memory bank parallelism, even when prefetching is not employed. Experimental evaluations across a diverse set of applications on a 30-core simulated GPGPU platform demonstrate that the prefetch-aware warp scheduler provides 25% and 7% average performance improvement over baselines that employ prefetching in conjunction with, respectively, the commonly-employed round-robin scheduler or the recently-proposed two-level warp scheduler. Moreover, when prefetching is not employed, the prefetch-aware warp scheduler provides higher performance than both of these baseline schedulers as it better exploits memory bank parallelism.

机译：在本文中，我们提出了在通用图形处理单元（GPGPU）架构中协调线程调度和预取决策的技术，以更好地容忍长存储延迟。我们证明了GPGPU架构中现有的翘曲调度策略无法有效地合并数据预取。主要原因是，它们调度连续的扭曲，这些扭曲很可能访问附近的缓存块，因此可以在连续的周期中背对背准确地相互预取。这可能是1）导致预取产生的翘曲过于接近另一个翘曲实际要求其相应地址的时间，或者2）需要复杂的预取器设计来正确预测将来的“远距离”翘曲所需的地址，而执行当前的扭曲。我们提出了一种克服这些问题的新的预取-翘曲翘曲调度策略。关键思想是及时分离连续翘曲的调度，以使它们不会背对背执行。我们表明，该策略不仅使简单的预取器可以有效地容忍内存延迟，而且即使不使用预取功能，也可以提高存储库并行度。在30核模拟GPGPU平台上对各种应用程序进行的实验评估表明，预取感知型翘曲调度程序比使用预取结合通常使用的通用回合基准的基准性能分别提高了25％和7％。罗宾调度程序或最近提出的两级翘曲调度程序。而且，当不使用预取时，预取可知的warp调度程序将比这两个基线调度程序提供更高的性能，因为它更好地利用了存储体并行性。

著录项

来源
《Computer architecture news》 |2013年第3期|332-343|共12页
作者
Adwait Jog; Onur Kayiran; Asit K. Mishra; Mahmut T. Kandemir; Onur Mutlu; Ravishankar lyer; Chita R. Das;
展开▼
作者单位

The Pennsylvania State University University Park, PA 16802;

The Pennsylvania State University University Park, PA 16802;

Intel Labs Hillsboro, OR 97124;

The Pennsylvania State University University Park, PA 16802;

Carnegie Mellon University Pittsburgh, PA 15213;

Intel Labs Hillsboro, OR 97124;

The Pennsylvania State University University Park, PA 16802;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPGPUs; Prefetching; Warp Scheduling; Latency Tolerance;

机译：GPGPU;预取;翘曲调度;延迟容忍;

相似文献

外文文献
中文文献
专利

1. Orchestrating Cache Management and Memory Scheduling for GPGPU Applications [J] . Mu S., Deng Y., Chen Y., Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2014,第8期

机译：为GPGPU应用程序协调缓存管理和内存调度
2. Instruction Prefetch for Improving GPGPU Performance [J] . Cao Jianli, Chen Zhikui, Wang Yuxin, IEICE Transactions on fundamentals of electronics, communications & computer sciences . 2021,第5期

机译：用于提高GPGPU性能的指令预取
3. A General Framework for Prefetch Scheduling in Linked Data Structures and Its Application to Multi-chain Prefetching [J] . SEUNGRYUL CHOI, NICHOLAS KOHOUT, SUMIT PAMNANI, ACM transactions on computer systems . 2004,第2期

机译：链接数据结构中预取调度的通用框架及其在多链预取中的应用
4. Optimization of Stride Prefetching Mechanism and Dependent Warp Scheduling on GPGPU [C] . Tsung-Han Tsou, Dun-Jie Chen, Sheng-Yang Hung, IEEE International Symposium on Circuits and Systems . 2020

机译：GPGPU的步幅预取机制优化和相关的翘曲调度
5. GPGPU multitasking and scheduling [D] . Adriaens, Jacob T. 2012

机译：GPGPU多任务和调度
6. CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU [O] . Jianliang Ma, Jinglei Meng, Tianzhou Chen, 2015

机译：CaLRS：GPGPU上的关键感知共享LLC请求调度算法
7. Orchestrated Scheduling and Prefetching for GPGPUs [O] . Onur Mutlu, Ravishankar Iyer, Chita R. Das 2013

机译：GpGpU的协调调度和预取

Orchestrated Scheduling and Prefetching for GPGPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅