Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels

Guin Gilman; Samuel S. Ogden; Tian Guo; Robert J. Walls

首页> 外文期刊>Performance evaluation review >Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels

【24h】

Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels

机译：将NVIDIA GPU线程块调度程序的展示位置策略搅拌，用于并发内核

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this work, we empirically derive the scheduler's behavior under concurrent workloads for NVIDIA's Pascal, Volta, and Turing microarchitectures. In contrast to past studies that suggest the scheduler uses a round-robin policy to assign thread blocks to streaming multiprocessors (SMs), we instead find that the scheduler chooses the next SM based on the SM's local resource availability. We show how this scheduling policy can lead to significant, and seemingly counter-intuitive, performance degradation; for example, a decrease of one thread per block resulted in a 3.58X increase in execution time for one kernel in our experiments. We hope that our work will be useful for improving the accuracy of GPU simulators and aid in the development of novel scheduling algorithms.

机译：在这项工作中，我们经验在NVIDIA的Pascal，Volta和图灵微架构的并发工作负载下默认调度程序的行为。与过去的研究相比，建议调度程序使用循环策略将线程块分配给流式多处理器（SMS），相反，我们发现调度程序根据SM的本地资源可用性选择下一个SM。我们展示了该调度政策如何导致重大，看似反向直观的性能下降;例如，每个块的一个线程减小导致在我们的实验中增加一个内核的执行时间3.58倍。我们希望我们的工作将有助于提高GPU模拟器的准确性，并有助于开发新的调度算法。

著录项

来源
《Performance evaluation review》 |2020年第3期|81-88|共8页
作者
Guin Gilman; Samuel S. Ogden; Tian Guo; Robert J. Walls;
展开▼
作者单位

Department of Computer Science Worcester Polytechnic Institute Worcester MA USA;

Department of Computer Science Worcester Polytechnic Institute Worcester MA USA;

Department of Computer Science Worcester Polytechnic Institute Worcester MA USA;

Department of Computer Science Worcester Polytechnic Institute Worcester MA USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Concurrent kernels; GPGPUs; scheduling algorithms;

机译：并发内核;GPGPUS;调度算法;

相似文献

外文文献
中文文献
专利

1. Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU [J] . Chen Zhao, Wu Gao, Feiping Nie, Future generation computer systems . 2020,第Nova期

机译：公平和缓存阻止了GPU上的并发内核执行的意识扭曲调度
2. cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs [J] . IEEE Transactions on Parallel and Distributed Systems . 2020,第4期

机译：cCUDA：GPU上并发内核的有效协同调度
3. Demystifying the 16×16 thread-block for stencils on the GPU [J] . Siham Tabik, Maurice Peemen, Nicolas Guil, Concurrency and computation: practice and experience . 2015,第18期

机译：揭开GPU上模版的16×16线程块的神秘面纱
4. Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels [C] . Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil International Conference on Parallel Architectures and Compilation Techniques . 2014

机译：具有并发GPGPU内核的在线结构运行时预测的抢占式线程块调度
5. Performance Evaluation of Blocking and Non-Blocking Concurrent Queues on GPUs [D] . Pourmeidani, Hossein 2018

机译：GPU上阻塞和非阻塞并发队列的性能评估
6. Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels [O] . Sreepathi Pai, R. Govindarajan, Matthew J 2015

机译：具有在线结构运行时预测的抢占式线程块调度，用于并发GpGpU内核

Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅