...
首页> 外文期刊>Computer architecture news >MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability
【24h】

MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability

机译:MCM-GPU:多芯片模块GPU,可实现持续的性能可扩展性

获取原文
获取原文并翻译 | 示例
           

摘要

Historically, improvements in GPU-based high performance computing have been tightly coupled to transistor scaling. As Moore's law slows down, and the number of transistors per die no longer grows at historical rates, the performance curve of single monohthic GPUs will ultimately plateau. However, the need for higher performing GPUs continues to exist in many domains. To address this need, in this paper we demonstrate that package-level integration of multiple GPU modules to build larger logical GPUs can enable continuous performance scaling beyond Moore's law. Specifically, we propose partitioning GPUs into easily manufacturable basic GPU Modules (GPMs), and integrating them on package using high bandwidth and power efficient signaling technologies. We lay out the details and evaluate the feasibility of a basic Multi-Chip-Module GPU (MCM-GPU) design. We then propose three architectural optimizations that significantly improve GPM data locality and minimize the sensitivity on inter-GPM bandwidth. Our evaluation shows that the optimized MCM-GPU achieves 22.8% speedup and 5x inter-GPM bandwidth reduction when compared to the basic MCM-GPU architecture. Most importantly, the optimized MCM-GPU design is 45.5% faster than the largest implementable monolithic GPU, and performs within 10% of a hypothetical (and unbuildable) monohthic GPU. Lastly we show that our optimized MCM-GPU is 26.8% faster than an equally equipped Multi-GPU system with the same total number of SMs and DRAM bandwidth.
机译:从历史上看,基于GPU的高性能计算的改进与晶体管缩放紧密相关。随着摩尔定律放慢,每个芯片的晶体管数量不再以历史速度增长,单个单芯片GPU的性能曲线最终将趋于平稳。但是,在许多领域中,对高性能GPU的需求仍然存在。为了满足这一需求,在本文中,我们证明了多个GPU模块的封装级集成以构建更大的逻辑GPU可以实现超越摩尔定律的连续性能扩展。具体来说,我们建议将GPU划分为易于制造的基本GPU模块(GPM),并使用高带宽和高能效信令技术将它们集成到封装中。我们对细节进行布局,并评估基本的多芯片模块GPU(MCM-GPU)设计的可行性。然后,我们提出了三种体系结构优化,可以显着提高GPM数据的局部性并最大程度地降低GPM间带宽的敏感性。我们的评估表明,与基本的MCM-GPU架构相比,经过优化的MCM-GPU可以实现22.8%的加速,并且GPM间带宽减少5倍。最重要的是,优化的MCM-GPU设计比最大的可实现单片GPU快45.5%,并且在假设的(且不可构建的)单片GPU的10%内执行。最后,我们证明,经过优化的MCM-GPU比具有相同SM和DRAM带宽总数的同等配备的Multi-GPU系统快26.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号