Exploiting Adaptive Data Compression to Improve Performance and Energy-Efficiency of Compute Workloads in Multi-GPU Systems

机译：利用自适应数据压缩来提高多GPU系统中计算工作负载的性能和能效

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics Processing Unit (GPU) performance has relied heavily on our ability to scale of number of transistors on chip, in order to satisfy the ever-increasing demands for more computation. However, transistor scaling has become extremely challenging, limiting the number of transistors that can be crammed onto a single die. Manufacturing large, fast and energy-efficient monolithic GPUs, while growing the number of stream processing units on-chip, is no longer a viable solution to scale performance. GPU vendors are aiming to exploit multi-GPU solutions, interconnecting multiple GPUs in the single node with a high bandwidth network (such as NVLink), or exploiting Multi-Chip-Module (MCM) packaging, where multiple GPU modules are integrated in a single package. The inter-GPU bandwidth is an expensive and critical resource for designing multi-GPU systems. The design of the inter-GPU network can impact performance significantly. To address this challenge, in this paper we explore the potential of hardware-based memory compression algorithms to save bandwidth and improve energy efficiency in multi-GPU systems. Specifically, we propose an adaptive inter-GPU data compression scheme to efficiently improve both performance and energy efficiency. Our evaluation shows that the proposed optimization on multi-GPU architectures can reduce the inter-GPU traffic up to 62%, improve system performance by up to 33%, and save energy spent powering the communication fabric by 45%, on average.

机译：图形处理单元（GPU）的性能在很大程度上取决于我们扩展芯片上晶体管数量的能力，以满足日益增长的对更多计算的需求。然而，晶体管缩放已变得极具挑战性，限制了可填入单个芯片的晶体管数量。制造大型，快速且节能的单片GPU，同时增加片上流处理单元的数量，已不再是提高性能的可行解决方案。 GPU供应商的目标是利用多GPU解决方案，将单个节点中的多个GPU与高带宽网络（例如NVLink）互连，或者利用多芯片模块（MCM）封装，其中将多个GPU模块集成在单个中包裹。 GPU间带宽是设计多GPU系统的昂贵且至关重要的资源。 GPU间网络的设计会严重影响性能。为了应对这一挑战，在本文中，我们探索了基于硬件的内存压缩算法在多GPU系统中节省带宽和提高能源效率的潜力。具体来说，我们提出了一种自适应GPU间数据压缩方案，以有效地提高性能和能效。我们的评估表明，针对多GPU架构提出的优化方案可以将GPU间的流量减少多达62％，将系统性能提高多达33％，并将为通信架构供电所需的能量平均减少了45％。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2019年|664-674|共11页
会议地点
作者
Mohammad Khavari Tavana; Yifan Sun; Nicolas Bohm Agostini; David Kaeli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Multi GPU system; Multi Chip Module; Compression algorithms; Bandwidth management; Performance;

机译：多GPU系统;多芯片模块;压缩算法;带宽管理;性能;

相似文献

外文文献
中文文献
专利

1. Thread-Aware Adaptive Prefetcher on Multicore Systems: Improving the Performance for Multithreaded Workloads [J] . Liu Peng, Yu Jiyang, Huang Michael C. ACM Transactions on Architecture and Code Optimization . 2016,第1期

机译：多核系统上的线程感知自适应预取器：提高多线程工作负载的性能
2. Exploiting Compression Opportunities to Improve SpMxV Performance on Shared Memory Systems [J] . KORNILIOS KOURTIS, GEORGIOS GOUMAS, NECTARIOS KOZIRIS ACM Transactions on Architecture and Code Optimization . 2010,第3期

机译：利用压缩机会来改善共享内存系统上的SpMxV性能
3. On the improved performances of the particle swarm optimization algorithms with adaptive parameters, cross-over operators and root mean square (RMS) variants for computing optimal control of a class of hybrid systems [J] . M. Senthil Arumugam, M. V. C. Rao Applied Soft Computing . 2008,第1期

机译：具有自适应参数，交叉算子和均方根（RMS）变量的粒子群优化算法的改进性能，用于计算一类混合系统的最优控制
4. Exploiting Adaptive Data Compression to Improve Performance and Energy-Efficiency of Compute Workloads in Multi-GPU Systems [C] . Mohammad Khavari Tavana, Yifan Sun, Nicolas Bohm Agostini, IEEE International Parallel and Distributed Processing Symposium . 2019

机译：利用自适应数据压缩，提高多GPU系统中计算工作负载的性能和节能
5. Improving Performance in Data Processing Distributed Systems by Exploiting Data Placement and Partitioning [D] . Huang, Dachuan. 2017

机译：通过利用数据放置和分区来提高数据处理分布式系统的性能
6. FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo3 Framework [O] . Alfonso Rodríguez, Juan Valverde, Jorge Portilla, 2018

机译：基于FPGA的高性能嵌入式系统用于网络物理系统中的自适应边缘计算：ARTICo3框架
7. Improving Middleware Performance with AdOC: an Adaptive Online Compression Library for Data Transfer [O] . Jeannot, Emmanuel 2005

机译：使用AdOC提高中间件性能：用于数据传输的自适应在线压缩库

Exploiting Adaptive Data Compression to Improve Performance and Energy-Efficiency of Compute Workloads in Multi-GPU Systems

摘要

著录项

相似文献

相关主题

期刊订阅