首页> 外文期刊>Future generation computer systems >Locality based warp scheduling in GPGPUs
【24h】

Locality based warp scheduling in GPGPUs

机译:GPGPU中基于位置的翘曲调度

获取原文
获取原文并翻译 | 示例
       

摘要

As the need for high performance computing continues to grow, it becomes more and more urgent to design a massive multi-core processor with high throughput and efficiency. However, when the number of cores keeps increasing, the capacity of on-chip memory is always insufficient. In a multi-core processor such as GPGPU (General Purpose Graphic Processor Unit), dozens or hundreds of SMs (Stream Multi-processor) coordinate to gain high throughput with several MB on-chip memory. Furthermore, in one SM, thousands of threads are organized as thread blocks to process instructions in a SIMT (Single Instruction Multiple Threads) manner. As all the threads share the same on-chip memory, the mismatch between large core number and small on-chip memory capacity can easily impair the performance due to excessive thread contention for cache resource.An efficient thread scheduling method is a promising way to alleviate the problems and to boost performance. From the hardware perspective, the instructions are executed by warps which are made up by a fixed number of threads. So we propose a novel warp scheduling scheme to maintain data locality and to relieve cache pollution and thrashing issues. First, to make full use of time locality, we put the disordered warps into a supervised warp queue and issue the warps from oldest to youngest. To utilize space locality and to hide computation unit stalls, we put forward a new insertion method called LPI (Locality Protected Insertion) to reorder warps in the supervised warp queue to better hide long-latency warps with short-latency warps such as ALU operations and on-chip accesses. Over a wide variety of applications, the new scheduling method gains at most 10.1% and an average of 2.2% improvements over the baseline loose round-robin scheduling.
机译:随着对高性能计算的需求不断增长,设计具有高吞吐量和效率的大型多核处理器变得越来越紧迫。但是,当内核数量不断增加时,片上存储器的容量始终不足。在诸如GPGPU(通用图形处理器单元)之类的多核处理器中,数十个或数百个SM(流多处理器)协同工作以通过数MB的片上存储器获得高吞吐量。此外,在一个SM中,数千个线程被组织为线程块,以SIMT(单指令多线程)方式处理指令。由于所有线程共享同一个片上内存,因此内核数量大和片上小内存容量之间的不匹配会由于缓存资源的过多线程争用而轻易损害性能。有效的线程调度方法是缓解该问题的一种有前途的方法问题并提高性能。从硬件的角度来看,这些指令是由warp执行的,warp由固定数量的线程组成。因此,我们提出了一种新颖的翘曲调度方案,以维护数据局部性并缓解缓存污染和抖动问题。首先,为了充分利用时间上的局限性,我们将无序的经纱放入有监督的经纱队列中,并从最早的到最小的经纱进行发布。为了利用空间局部性并隐藏计算单元的停顿,我们提出了一种新的插入方法,称为LPI(局部性保护插入),以对有监督的扭曲队列中的扭曲进行重新排序,以更好地隐藏具有短延迟扭曲的长延迟扭曲,例如ALU操作和片上访问。在广泛的应用中,新的调度方法比基准松散循环调度最多获得10.1%的收益,平均提高2.2%。

著录项

  • 来源
    《Future generation computer systems》 |2018年第5期|520-527|共8页
  • 作者单位

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology;

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology;

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology;

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology;

    Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    GPGPU; Warp scheduling; Locality; Reordering;

    机译:GPGPU;Warp调度;局部性;重新排序;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号