首页> 外文期刊>Computer architecture news >Access Pattern-Aware Cache Management for Improving Data Utilization in GPU
【24h】

Access Pattern-Aware Cache Management for Improving Data Utilization in GPU

机译:访问模式感知缓存管理,可提高GPU中的数据利用率

获取原文
获取原文并翻译 | 示例

摘要

Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates significant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space. In this paper we discover that individual load instructions in a warp exhibit four different types of data locality behavior: (1) data brought by a warp load instruction is used only once, which is classified as streaming data (2) data brought by a warp load is reused multiple times within the same warp, called intra-warp locality (3) data brought by a warp is reused multiple times but across different warps, called inter-warp locality (4) and some data exhibit both a mix of intra- and inter-warp locality. Furthermore, each load instruction exhibits consistently the same locality type across all warps within a GPU kernel. Based on this discovery we argue that cache management must be done using per-load locality type information, rather than applying warp-wide cache management policies. We propose Access Pattern-aware Cache Management (APCM), which dynamically detects the locality type of each load instruction by monitoring the accesses from one exemplary warp. APCM then uses the detected locality type to selectively apply cache bypassing and cache pinning of data based on load locality characterization. Using an extensive set of simulations we show that APCM improves performance of GPUs by 34% for cache sensitive applications while saving 27% of energy consumption over baseline GPU.
机译:内存操作的长等待时间是图形处理单元(GPU)的突出性能瓶颈。必须在数十个线程束(线程集合)之间共享的小型数据缓存会造成大量的缓存争用和过早的数据逐出。先前的工作已经认识到这个问题,并提出了经节流,这减少了争用高速缓存空间的活动经纱的数量。在本文中,我们发现扭曲中的各个加载指令表现出四种不同类型的数据局部性行为:(1)扭曲加载指令带来的数据仅使用一次,被分类为流数据(2)扭曲带来的数据负载在同一经线内被多次重用,称为经线内局部性(3),经线带来的数据可重复使用多次,但跨不同的经线跨度被称为经间局部性(4),并且某些数据同时显示以及局部变形。此外,每条加载指令在GPU内核中的所有扭曲上均展现出相同的局部性类型。基于此发现,我们认为缓存管理必须使用每个负载的位置类型信息来完成,而不是应用整个warp范围的缓存管理策略。我们提出了访问模式感知的缓存管理(APCM),它通过监视来自一个示例性翘曲的访问来动态检测每个加载指令的位置类型。然后,APCM使用检测到的局部性类型,根据负载局部性特征有选择地应用高速缓存绕过和数据的高速缓存固定。通过广泛的仿真,我们证明APCM在对缓存敏感的应用程序中将GPU的性能提高了34%,而与基准GPU相比,可节省27%的能耗。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号