首页> 外文会议>International Symposium on Microarchitecture >DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission

【24h】

DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission

机译：DEFTNN：通过Synapse向量消除和近代数据裂变来解决GPU上的DNN执行瓶颈

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural networks (DNNs) are key computational building blocks for emerging classes of web services that interact in real time with users via voice, images and video inputs. Although GPUs have gained popularity as a key accelerator platform for deep learning workloads, the increasing demand for DNN computation leaves a significant gap between the compute capabilities of GPU-enabled data centers and the compute needed to service demand. The state-of-the-art techniques to improve DNN performance have significant limitations in bridging the gap on real systems. Current network pruning techniques remove computation, but the resulting networks map poorly to GPU architectures, yielding no performance benefit or even slowdowns. Meanwhile, current bandwidth optimization techniques focus on reducing off-chip bandwidth while overlooking on-chip bandwidth, a key DNN bottleneck. To address these limitations, this work introduces DeftNN, a GPU DNN execution framework that targets the key architectural bottlenecks of DNNs on GPUs to automatically and transparently improve execution performance. DeftNN is composed of two novel optimization techniques - (1) synapse vector elimination, a technique that identifies non-contributing synapses in the DNN and carefully transforms data and removes the computation and data movement of these synapses while fully utilizing the GPU to improve performance, and (2) near-compute data fission, a mechanism for scaling down the on-chip data movement requirements within DNN computations. Our evaluation of DeftNN spans 6 state-of-the-art DNNs. By applying both optimizations in concert, DeftNN is able to achieve an average speedup of 2.1× on real GPU hardware. We also introduce a small additional hardware unit per GPU core to facilitate efficient data fission operations, increasing the speedup achieved by DeftNN to 2.6×.

机译：深层神经网络（DNNs）是新兴的Web服务类计算的关键构建模块，在通过语音，图像和视频输入用户实时互动。虽然图形处理器已经得到普及的一个关键加速器平台，深度学习的工作量，DNN的计算叶片的需求不断增加启用GPU的数据中心的计算能力和需要服务的需求计算之间的显著差距。国家的最先进的技术来提高性能DNN有显著限制在弥合真实系统的差距。当前网络修剪技术去除计算，但由此产生的网络不良映射到GPU架构，产生无性能优势，甚至怠工。同时，目前的带宽优化技术，注重降低片的带宽，同时俯瞰片上带宽，关键DNN瓶颈。为了解决这些限制，这项工作介绍DeftNN，一个GPU DNN执行框架，DNNs对GPU的目标的关键架构瓶颈，自动，透明地提高执行性能。 DeftNN由两个新型优化技术 - （1）突触矢量消除，一种技术，识别无贡献在DNN突触和仔细变换数据，并删除，同时充分利用GPU来提高性能这些突触的计算和数据移动，和（2）的近计算数据裂变，用于缩小DNN计算内片上的数据移动的要求的机构。我们DeftNN的评价跨度6的国家的最先进的DNNs。通过在演唱会同时应用优化，DeftNN能够达到2.1的平均增速×真实GPU硬件。我们还介绍了一个小的附加硬件单元每GPU核心，以促进高效的数据裂变操作，增加由DeftNN取得的加速至2.6×。

著录项

来源
《International Symposium on Microarchitecture》|2017年|xix 825 p. :|共14页
会议地点
作者
Parker Hill; Animesh Jain; Mason Hill; Babak Zamirai; Chang-Hong Hsu; Michael A. Laurenzano; Scott Mahlke; Lingjia Tang; Jason Mars;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302-532;
关键词
graphics processing units; learning (artificial intelligence); multiprocessing systems; neural nets; optimisation; Web services;

机译：图形处理单元;学习（人工智能）;多处理系统;神经网络;优化;Web服务;

相似文献

外文文献
专利

1. Design and engineering of a simplified workflow execution for the MG5aMC event generator on GPUs and vector CPUs [J] . Andrea Valassi, Stefan Roiser, Olivier Mattelaer, EPJ Web of Conferences . 2021,第a期

机译：GPU和矢量CPU上MG5AMC事件发生器简化工作流程的设计与工程
2. A Case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling Flexible Data Compression with Assist Warps [J] . Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Computer architecture news . 2015,第3期

机译：GPU中核心辅助瓶颈加速的案例：通过辅助扭曲实现灵活的数据压缩
3. Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution [J] . Sangpil Lee, Keunsoo Kim, Gunjae Koo, IEEE Transactions on Computers . 2017,第5期

机译：通过数据压缩和压缩执行来提高GPU的能源效率
4. DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission [C] . Parker Hill, Animesh Jain, Mason Hill, Annual IEEE/ACM International Symposium on Microarchitecture . 2017

机译：DeftNN：通过Synapse矢量消除和近计算数据裂变解决在GPU上执行DNN的瓶颈
5. Efficient time-energy execution of data-parallel applications on heterogeneous systems with GPU [D] . Loghin, Dumitrel. 2017

机译：使用GPU在异构系统上高效执行数据并行应用程序的时间能量
6. Performance data of multiple-precision scalar and vector BLAS operations on CPU and GPU [O] . Konstantin Isupov 2020

机译：CPU和GPU上的多精度标量和矢量BLAS操作的性能数据
7. A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps [O] . Vijaykumar, Nandita, Pekhimenko, Gennady, Jog, Adwait, 2016

机译：利用assist加速GpU执行瓶颈的框架扭曲

DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission

摘要

著录项

相似文献

相关主题

期刊订阅