首页> 外文期刊>Fortschritte der Physik >CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs
【24h】

CRAT: Enabling Coordinated Register Allocation and Thread-Level Parallelism Optimization for GPUs

机译:CRAT:支持GPU的协调寄存器分配和线程并行优化

获取原文
获取原文并翻译 | 示例
       

摘要

The key to the high performance on GPUs lies in the massive threading to enable thread switching and hide long latencies. CPUs are equipped with a large register file to enable fast context switch. However, thread throttling techniques that are designed to mitigate cache contention, lead to under-utilization of registers. Register allocation is a significant factor for performance as it not just determines the single-thread performance, but indirectly affects the TLP. In this paper, we propose Coordinated Register Allocation and Thread-level parallelism (CRAT) to explore the optimization space of register allocation and TLP management on GPUs. CRAT employs both compile-time(CRAT-static) and run-time techniques(CRAT-dyn) to exhaust the design space. CRAT-static works statically to explore TLP and register allocation trade-off and CRAT-dyn exploits dynamic register allocation for further improvement. Experiments indicate that CRAT-static achieves an average 1.25X speedup over existing TLP management technique. On four register-limited applications, CRAT-dyn further improves the performance speedup of CRAT-static from 1.51X to 1.70X.
机译:GPU上高性能的关键位于大规模的线程中,以使线路切换和隐藏长期延迟。 CPU配备了一个大型寄存器文件以启用快速上下文切换。但是,旨在减轻缓存争用的线程限制技术导致寄存器的不利用率。寄存器分配是性能的重要因素,因为它不仅仅是确定单线程性能,而且间接影响TLP。在本文中,我们提出了协调的寄存器分配和线程并行性(CRAT)来探索GPU上的寄存器分配和TLP管理的优化空间。 CRAT采用编译时间(近距离静态)和运行时技术(CRAT-DYN)来排出设计空间。讽刺 - 静态工作静态探索TLP和寄存器分配权衡,CRAT-DYN利用动态寄存器分配进行进一步改进。实验表明,CRAT-STATIC在现有TLP管理技术上实现了平均1.25倍的加速。在四个寄存器限制的应用中,CRAT-DYN进一步改善了1.51倍至1.70倍的CRAT-静态的性能加速。

著录项

  • 来源
    《Fortschritte der Physik》 |2018年第6期|共8页
  • 作者单位

    Peking Univ Sch EECS Ctr Energy Efficient Comp &

    Applicat Beijing 100080 Peoples R China;

    Peking Univ Sch EECS Ctr Energy Efficient Comp &

    Applicat Beijing 100080 Peoples R China;

    Peking Univ Sch EECS Ctr Energy Efficient Comp &

    Applicat Beijing 100080 Peoples R China;

    Peking Univ Sch EECS Ctr Energy Efficient Comp &

    Applicat Beijing 100080 Peoples R China;

    Peking Univ Sch EECS Ctr Energy Efficient Comp &

    Applicat Beijing 100080 Peoples R China;

    Peking Univ Sch EECS Ctr Energy Efficient Comp &

    Applicat Beijing 100080 Peoples R China;

    Chinese Acad Sci Inst Comp Technol Beijing 100864 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 物理学;
  • 关键词

    GPGPU; memory hierarchy; compilers;

    机译:GPGPU;记忆层次结构;编译器;
  • 入库时间 2022-08-20 03:50:02

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号