首页> 外文会议>International Symposium on Microarchitecture >DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission
【24h】

DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission

机译:DEFTNN:通过Synapse向量消除和近代数据裂变来解决GPU上的DNN执行瓶颈

获取原文

摘要

Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for DNN computation leaves a significant gap between the compute capabilities of GPU-enabled data centers and the compute needed to service demand. The state-of-the-art techniques to improve DNN performance have significant limitations in bridging the gap on real systems. Current network pruning techniques remove computation, but the resulting networks map poorly to GPU architectures, yielding no performance benefit or even slowdowns. Meanwhile, current bandwidth optimization techniques focus on reducing off-chip bandwidth while overlooking on-chip bandwidth, a key DNN bottleneck. To address these limitations, this work introduces DeftNN, a GPU DNN execution framework that targets the key architectural bottlenecks of DNNs on GPUs to automatically and transparently improve execution performance. DeftNN is composed of two novel optimization techniques - (1) synapse vector elimination, a technique that identifies non-contributing synapses in the DNN and carefully transforms data and removes the computation and data movement of these synapses while fully utilizing the GPU to improve performance, and (2) near-compute data fission, a mechanism for scaling down the on-chip data movement requirements within DNN computations. Our evaluation of DeftNN spans 6 state-of-the-art DNNs. By applying both optimizations in concert, DeftNN is able to achieve an average speedup of 2.1× on real GPU hardware. We also introduce a small additional hardware unit per GPU core to facilitate efficient data fission operations, increasing the speedup achieved by DeftNN to 2.6×.
机译:深层神经网络(DNNs)是新兴的Web服务类计算的关键构建模块,在通过语音,图像和视频输入用户实时互动。虽然图形处理器已经得到普及的一个关键加速器平台,深度学习的工作量,DNN的计算叶片的需求不断增加启用GPU的数据中心的计算能力和需要服务的需求计算之间的显著差距。国家的最先进的技术来提高性能DNN有显著限制在弥合真实系统的差距。当前网络修剪技术去除计算,但由此产生的网络不良映射到GPU架构,产生无性能优势,甚至怠工。同时,目前的带宽优化技术,注重降低片的带宽,同时俯瞰片上带宽,关键DNN瓶颈。为了解决这些限制,这项工作介绍DeftNN,一个GPU DNN执行框架,DNNs对GPU的目标的关键架构瓶颈,自动,透明地提高执行性能。 DeftNN由两个新型优化技术 - (1)突触矢量消除,一种技术,识别无贡献在DNN突触和仔细变换数据,并删除,同时充分利用GPU来提高性能这些突触的计算和数据移动,和(2)的近计算数据裂变,用于缩小DNN计算内片上的数据移动的要求的机构。我们DeftNN的评价跨度6的国家的最先进的DNNs。通过在演唱会同时应用优化,DeftNN能够达到2.1的平均增速×真实GPU硬件。我们还介绍了一个小的附加硬件单元每GPU核心,以促进高效的数据裂变操作,增加由DeftNN取得的加速至2.6×。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号