首页> 外文会议>IEEE/ACM International Symposium on Microarchitecture >Improving GPU performance via large warps and two-level warp scheduling
【24h】

Improving GPU performance via large warps and two-level warp scheduling

机译:通过大扭曲和两级扭曲调度来提高GPU性能

获取原文

摘要

Due to their massive computational power, graphics processing units (GPUs) have become a popular platform for executing general purpose parallel applications. GPU programming models allow the programmer to create thousands of threads, each executing the same computing kernel. GPUs exploit this parallelism in two ways. First, threads are grouped into fixed-size SIMD batches known as warps, and second, many such warps are concurrently executed on a single GPU core. Despite these techniques, the computational resources on GPU cores are still underutilized, resulting in performance far short of what could be delivered. Two reasons for this are conditional branch instructions and stalls due to long latency operations. To improve GPU performance, computational resources must be more effectively utilized. To accomplish this, we propose two independent ideas: the large warp microarchitecture and two-level warp scheduling. We show that when combined, our mechanisms improve performance by 19.1% over traditional GPU cores for a wide variety of general purpose parallel applications that heretofore have not been able to fully exploit the available resources of the GPU chip.
机译:由于其强大的计算能力,图形处理单元(GPU)已成为执行通用并行应用程序的流行平台。 GPU编程模型允许程序员创建数千个线程,每个线程执行相同的计算内核。 GPU以两种方式利用这种并行性。首先,将线程分组为固定大小的SIMD批处理,称为“ warp”,然后,在单个GPU内核上同时执行许多此类warp。尽管有这些技术,但GPU内核上的计算资源仍未得到充分利用,从而导致性能远远低于可交付的性能。造成这种情况的两个原因是条件分支指令和由于长时间等待操作而导致的停顿。为了提高GPU性能,必须更有效地利用计算资源。为此,我们提出了两个独立的想法:大型翘曲微体系结构和两级翘曲调度。我们证明,结合使用这些机制,我们的机制相对于传统的GPU内核,对于迄今为止无法充分利用GPU芯片的可用资源的各种通用并行应用程序,其性能提高了19.1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号