首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Exploiting Adaptive Data Compression to Improve Performance and Energy-Efficiency of Compute Workloads in Multi-GPU Systems
【24h】

Exploiting Adaptive Data Compression to Improve Performance and Energy-Efficiency of Compute Workloads in Multi-GPU Systems

机译:利用自适应数据压缩来提高多GPU系统中计算工作负载的性能和能效

获取原文

摘要

Graphics Processing Unit (GPU) performance has relied heavily on our ability to scale of number of transistors on chip, in order to satisfy the ever-increasing demands for more computation. However, transistor scaling has become extremely challenging, limiting the number of transistors that can be crammed onto a single die. Manufacturing large, fast and energy-efficient monolithic GPUs, while growing the number of stream processing units on-chip, is no longer a viable solution to scale performance. GPU vendors are aiming to exploit multi-GPU solutions, interconnecting multiple GPUs in the single node with a high bandwidth network (such as NVLink), or exploiting Multi-Chip-Module (MCM) packaging, where multiple GPU modules are integrated in a single package. The inter-GPU bandwidth is an expensive and critical resource for designing multi-GPU systems. The design of the inter-GPU network can impact performance significantly. To address this challenge, in this paper we explore the potential of hardware-based memory compression algorithms to save bandwidth and improve energy efficiency in multi-GPU systems. Specifically, we propose an adaptive inter-GPU data compression scheme to efficiently improve both performance and energy efficiency. Our evaluation shows that the proposed optimization on multi-GPU architectures can reduce the inter-GPU traffic up to 62%, improve system performance by up to 33%, and save energy spent powering the communication fabric by 45%, on average.
机译:图形处理单元(GPU)的性能在很大程度上取决于我们扩展芯片上晶体管数量的能力,以满足日益增长的对更多计算的需求。然而,晶体管缩放已变得极具挑战性,限制了可填入单个芯片的晶体管数量。制造大型,快速且节能的单片GPU,同时增加片上流处理单元的数量,已不再是提高性能的可行解决方案。 GPU供应商的目标是利用多GPU解决方案,将单个节点中的多个GPU与高带宽网络(例如NVLink)互连,或者利用多芯片模块(MCM)封装,其中将多个GPU模块集成在单个中包裹。 GPU间带宽是设计多GPU系统的昂贵且至关重要的资源。 GPU间网络的设计会严重影响性能。为了应对这一挑战,在本文中,我们探索了基于硬件的内存压缩算法在多GPU系统中节省带宽和提高能源效率的潜力。具体来说,我们提出了一种自适应GPU间数据压缩方案,以有效地提高性能和能效。我们的评估表明,针对多GPU架构提出的优化方案可以将GPU间的流量减少多达62%,将系统性能提高多达33%,并将为通信架构供电所需的能量平均减少了45%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号