首页> 外文期刊>IEEE computer architecture letters >Inter-Core Locality Aware Memory Scheduling
【24h】

Inter-Core Locality Aware Memory Scheduling

机译:内核间位置感知内存调度

获取原文
获取原文并翻译 | 示例
       

摘要

Graphics Processing Units (GPUs) run thousands of parallel threads and achieve high Memory Level Parallelism (MLP). To support high Memory Level Parallelism, a structure called a Miss-Status Holding Register (MSHR) handles multiple in-flight miss requests. When multiple cores send requests to the same cache line, the requests are merged into one last level cache MSHR entry and only one memory request is sent to the Dynamic Random-Access Memory (DRAM). We call this inter-core locality. The main reason for inter-core locality is that multiple cores access shared read-only data within the same cache line. By prioritizing memory requests that have high inter-core locality, more threads resume execution. In this paper, we analyze the reason for inter-core locality and show that requests with inter-core locality are more critical to performance. We propose a GPU DRAM scheduler that exploits information about inter-core locality detected at the last level cache MSHRs. For high inter-core locality benchmarks this leads to an average 28 percent reduction in memory request latency and 11 percent improvement in performance.
机译:图形处理单元(GPU)运行数千个并行线程,并实现高内存级别并行性(MLP)。为了支持高内存级别并行性,一种称为未命中状态保持寄存器(MSHR)的结构可处理多个飞行中未命中请求。当多个内核将请求发送到同一高速缓存行时,这些请求将合并到一个最后一级的高速缓存MSHR条目中,并且只有一个内存请求被发送到动态随机存取内存(DRAM)。我们称之为核心间位置。内核间局部性的主要原因是多个内核访问同一缓存行内的共享只读数据。通过对内核间位置较高的内存请求进行优先级排序,更多线程将恢复执行。在本文中,我们分析了核心间局部性的原因,并表明具有核心间局部性的请求对于性能而言更为关键。我们提出了一种GPU DRAM调度程序,该程序利用有关在最后一级缓存MSHR处检测到的内核间位置的信息。对于高内核间位置基准,这将使内存请求延迟平均降低28%,而性能则提高11%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号