首页> 外文OA文献 >Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs
【2h】

Evaluating and Mitigating Bandwidth Bottlenecks Across the Memory Hierarchy in GPUs

机译:评估和减轻GpU中内存层次结构的带宽瓶颈

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

GPUs are often limited by off-chip memory bandwidth. With the advent of general-purpose computing on GPUs, a cache hierarchy has been introduced to filter the bandwidth demand to the off-chip memory. However, the cache hierarchy presents its own bandwidth limitations in sustaining such highlevels of memory traffic.In this paper, we characterize the bandwidth bottlenecks present across the memory hierarchy in GPUs for general purpose applications. We quantify the stalls throughout the memory hierarchy and identify the architectural parameters that play a pivotal role in leading to a congested memory system. We explore the architectural design space to mitigate the bandwidth bottlenecks and show that performance improvement achieved by mitigating the bandwidth bottleneck in the cache hierarchy can exceed the speedup obtained by a memory system with a baseline cache hierarchy and High Bandwidth Memory (HBM) DRAM. We also show that addressing the bandwidth bottleneck in isolation at specific levels can be sub-optimal and can even be counter-productive. Therefore, we show that it is imperative to resolve the bandwidth bottlenecks synergistically across different levels of the memory hierarchy. With the insights developed in this paper, we perform a cost-benefit analysis and identify cost effective configurations of the memory hierarchy that effectively mitigate the bandwidth bottlenecks. We show that our final configuration achieves a performance improvement of 29% on average with a minimal area overhead of 1.6%.
机译:GPU通常受片外内存带宽的限制。随着通用计算在GPU上的出现,已经引入了缓存层次结构以过滤对片外存储器的带宽需求。但是,缓存层次结构在维持如此高级别的内存流量方面存在其自身的带宽限制。在本文中,我们描述了通用应用GPU中内存层次结构中存在的带宽瓶颈。我们对整个内存层次结构中的停顿进行量化,并确定在导致内存系统拥塞的过程中起关键作用的架构参数。我们探索了架构设计空间来缓解带宽瓶颈,并表明通过缓解缓存层次结构中的带宽瓶颈所实现的性能改进可以超过具有基线缓存层次结构和高带宽内存(HBM)DRAM的内存系统所获得的加速。我们还表明,在特定级别隔离解决带宽瓶颈可能不是最优的,甚至会适得其反。因此,我们表明必须在内存层次结构的不同级别之间协同解决带宽瓶颈。利用本文中得出的见解,我们可以进行成本效益分析,并确定可有效缓解带宽瓶颈的内存层次结构的经济高效配置。我们显示出,最终配置的性能平均提高了29%,最小的区域开销为1.6%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号