首页> 外文期刊>Future generation computer systems >Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU
【24h】

Fair and cache blocking aware warp scheduling for concurrent kernel execution on GPU

机译:公平和缓存阻止了GPU上的并发内核执行的意识扭曲调度

获取原文
获取原文并翻译 | 示例
           

摘要

With Graphic Processing Units (CPUs) being widely adopted in data centers to provide computing power, efficient support for GPU multitasking attracts significant attention. The prior GPU multitasking works include spatial multitasking and simultaneous multitasking (SMK). Spatial multitasking allocates GPU resources at the streaming multiprocessor (SM) granularity which is coarse-grained, and SMK runs concurrent kernels on the same SM, therefore is fine-grained. SMK is beneficial to improve GPU resource utilization especially when concurrent kernels have complementary characteristics. However, the main challenge for SMK is the interference among multiple kernels especially the contention on data cache. In this paper, we propose a fair and cache blocking aware warp scheduling (FCBWS) approach to ameliorate the contention on data cache and improve SMK on GPUs. In FCBWS, equal opportunity of issuing instructions is provided to each kernel, and memory pipeline stalls are tried to be avoided by predicting cache blocking. Kernels are extracted from various applications to construct concurrent kernel execution benchmarks. The simulation experiment results show that FCBWS outperforms previous multitasking methods; even compared to the state-of-the-art SMK method, FCBWS can improve system throughput (STP) by 10% on average and reduce average normalized turnaround time (ANTT) by 41% on average.
机译:通过在数据中心广泛采用的图形处理单元(CPU)以提供计算能力,高效支持GPU多任务处理吸引了显着的关注。先前的GPU多任务工作包括空间多任务和同时多任务(SMK)。空间多任务处理在粗粒度的流式多处理器(SM)粒度下分配GPU资源,并且SMK在同一SM上运行并发内核,因此是细粒度。 SMK有利于提高GPU资源利用,特别是当并发内核具有互补特性时。但是,SMK的主要挑战是多个内核之间的干扰,尤其是数据缓存的争用。在本文中,我们提出了一个公平和缓存阻止的扭曲调度(FCBWS)方法来改善数据缓存的争用,并在GPU上改进SMK。在FCBWS中,向每个内核提供发出指令的相同机会,通过预测高速缓存阻塞,尝试避免内存流水线摊位。从各种应用中提取内核以构建并发内核执行基准。仿真实验结果表明,FCBWS优于先前的多任务方法;甚至与最先进的SMK方法相比,FCBW也可以平均提高系统吞吐量(STP)10%,平均降低平均归一化的周转时间(ANTT)41%。

著录项

  • 来源
    《Future generation computer systems》 |2020年第11期|1093-1105|共13页
  • 作者单位

    School of Computer Northwestern Polytechnical University Xi'an China;

    School of Computer Northwestern Polytechnical University Xi'an China;

    School of Computer Northwestern Polytechnical University Xi'an China;

    Institute of Artificial Intelligence and Robotics Xi'an Jiaotong University Xi'an China;

    Department of Electrical and Computer Engineering North Carolina State University Raleigh USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    GPU; Concurrent kernels; Warp scheduling; Cache blocking; Interference;

    机译:GPU;并发内核;经线调度;缓存阻塞;干涉;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号