首页> 外文期刊>Future generation computer systems >Accurately modeling the on-chip and off-chip GPU memory subsystem
【24h】

Accurately modeling the on-chip and off-chip GPU memory subsystem

机译:精确建模片上和片外GPU内存子系统

获取原文
获取原文并翻译 | 示例
       

摘要

Research on GPU architecture is becoming pervasive in both the academia and the industry because these architectures offer much more performance per watt than typical CPU architectures. This is the main reason why massive deployment of GPU multiprocessors is considered one of the most feasible solutions to attain exascale computing capabilities.The memory hierarchy of the GPU is a critical research topic, since its design goals widely differ from those of conventional CPU memory hierarchies. Researchers typically use detailed microarchitectural simulators to explore novel designs to better support GPGPU computing as well as to improve the performance of GPU and CPU–GPU systems. In this context, the memory hierarchy is a critical and continuously evolving subsystem.Unfortunately, the fast evolution of current memory subsystems deteriorates the accuracy of existing state-of-the-art simulators. This paper focuses on accurately modeling the entire (both on-chip and off-chip) GPU memory subsystem. For this purpose, we identify four main memory related components that impact on the overall performance accuracy. Three of them belong to the on-chip memory hierarchy: (i) memory request coalescing mechanisms, (ii) miss status holding registers, and (iii) cache coherence protocol; while the fourth component refers to the memory controller and GDDR memory working activity.To evaluate and quantify our claims, we accurately modeled the aforementioned memory components in an extended version of the state-of-the-art Multi2Sim heterogeneous CPU–GPU processor simulator. Experimental results show important deviations, which can vary the final system performance provided by the simulation framework up to a factor of three. The proposed GPU model has been compared and validated against the original framework and the results from a real AMD Southern-Islands 7870HD GPU.
机译:对GPU架构的研究在学术界和行业中都变得越来越普遍,因为这些架构每瓦特的性能要比典型的CPU架构高得多。这就是为什么大规模部署GPU多处理器被认为是获得百亿亿次计算能力的最可行解决方案之一的主要原因.GPU的内存层次结构是一个至关重要的研究主题,因为其设计目标与传统的CPU内存层次结构大不相同。 。研究人员通常使用详细的微体系结构模拟器来探索新颖的设计,以更好地支持GPGPU计算以及提高GPU和CPU-GPU系统的性能。在这种情况下,内存层次结构是一个关键且不断发展的子系统。不幸的是,当前内存子系统的快速发展降低了现有技术模拟器的准确性。本文着重于对整个(片上和片外)GPU存储器子系统进行准确建模。为此,我们确定了四个与内存相关的主要组件,这些组件会影响整体性能准确性。它们中的三个属于片上存储器层次结构:(i)存储器请求合并机制,(ii)丢失状态保持寄存器,和(iii)缓存一致性协议;为了评估和量化我们的主张,我们在最先进的Multi2Sim异构CPU-GPU处理器模拟器的扩展版本中对上述内存组件进行了精确建模。实验结果表明存在重大偏差,这可能会使仿真框架提供的最终系统性能变化最多三倍。拟议的GPU模型已与原始框架进行了比较和验证,并且来自真实的AMD Southern-Islands 7870HD GPU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号