...
首页> 外文期刊>Performance evaluation review >Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels
【24h】

Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels

机译:将NVIDIA GPU线程块调度程序的展示位置策略搅拌,用于并发内核

获取原文
获取原文并翻译 | 示例

摘要

In this work, we empirically derive the scheduler's behavior under concurrent workloads for NVIDIA's Pascal, Volta, and Turing microarchitectures. In contrast to past studies that suggest the scheduler uses a round-robin policy to assign thread blocks to streaming multiprocessors (SMs), we instead find that the scheduler chooses the next SM based on the SM's local resource availability. We show how this scheduling policy can lead to significant, and seemingly counter-intuitive, performance degradation; for example, a decrease of one thread per block resulted in a 3.58X increase in execution time for one kernel in our experiments. We hope that our work will be useful for improving the accuracy of GPU simulators and aid in the development of novel scheduling algorithms.
机译:在这项工作中,我们经验在NVIDIA的Pascal,Volta和图灵微架构的并发工作负载下默认调度程序的行为。与过去的研究相比,建议调度程序使用循环策略将线程块分配给流式多处理器(SMS),相反,我们发现调度程序根据SM的本地资源可用性选择下一个SM。我们展示了该调度政策如何导致重大,看似反向直观的性能下降;例如,每个块的一个线程减小导致在我们的实验中增加一个内核的执行时间3.58倍。我们希望我们的工作将有助于提高GPU模拟器的准确性,并有助于开发新的调度算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号