Optimizing the efficiency of deep learning through accelerator virtualization

M. Gschwind; T. Kaldewey; D. K. Tam

首页> 外文期刊>IBM Journal of Research and Development >Optimizing the efficiency of deep learning through accelerator virtualization

【24h】

Optimizing the efficiency of deep learning through accelerator virtualization

机译：通过加速器虚拟化优化深度学习的效率

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Training deep learning models often occupies entire compute clusters, built solely for this purpose, for days or even weeks at a time. There exists a large body of work on approaches for improving training performance, ranging from novel algorithms to full custom hardware accelerators. Offering compute capabilities of multiple teraflops (trillion floating point operations per second), graphics processing units (GPUs) have established themselves as a de-facto standard for accelerating deep learning network training. As systems with up to 16 GPUs—each GPU consuming up to 300 W—become available, efficient usage of these resources becomes imperative. We conduct a detailed analysis of deep learning workloads to characterize their efficiency in making use of GPU acceleration. We found that many deep learning workloads consume only a fraction of GPU resources, and we demonstrate how sharing GPU resources can improve throughput by a factor of 3, effectively turning a 4-GPU commodity cloud system into a high-end 12-GPU supercomputer. Using Watson workloads from three major areas that incorporate deep learning technology—i.e., language classification, visual recognition, and speech recognition—we document the effectiveness and scalability of our approach. We are working toward enabling GPU virtualization not only to reduce cost, but also to accelerate new breakthroughs in deep learning by increasing compute capacity without making further hardware investments.

机译：训练深度学习模型通常占用整个计算集群，而这些集群是专门为此目的而构建的，一次要花费几天甚至几周的时间。从新颖的算法到完全自定义的硬件加速器，都有大量的方法用于提高训练性能。图形处理单元（GPU）提供多个万亿次运算能力（每秒数万亿个浮点运算），已将它们确立为事实上的标准，用于加速深度学习网络培训。随着具有多达16个GPU（每个GPU消耗300 W）的系统变得可用，有效利用这些资源变得势在必行。我们对深度学习工作负载进行了详细分析，以表征其利用GPU加速的效率。我们发现许多深度学习工作负载仅消耗GPU资源的一小部分，并且展示了共享GPU资源如何将吞吐量提高3倍，从而有效地将4-GPU商品云系统转变为高端12-GPU超级计算机。使用结合了深度学习技术的三个主要领域的Watson工作负载-语言分类，视觉识别和语音识别-我们记录了这种方法的有效性和可扩展性。我们正在努力实现GPU虚拟化，不仅可以降低成本，而且可以通过增加计算能力而无需进行进一步的硬件投资来加速深度学习的新突破。

著录项

来源
《IBM Journal of Research and Development》 |2017年第4期|1-11|共11页
作者
M. Gschwind; T. Kaldewey; D. K. Tam;
展开▼
作者单位

IBM Systems, IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;

IBM Watson, New York City, NY, USA;

IBM Systems, IBM Canada Lab-Toronto Site, Markham, ON, Canada;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Training; Graphics processing units; Acceleration; Speech recognition; Machine learning; Virtualization; Hardware;

机译：培训;图形处理单元;加速;语音识别;机器学习;虚拟化;硬件;

相似文献

外文文献
中文文献
专利

1. Optimizing Memory Efficiency for Deep Convolutional Neural Network Accelerators [J] . Li Xiaowei, Li Jiajun, Yan Guihai Journal of Low Power Electronics . 2018,第4期

机译：优化深度卷积神经网络加速器的内存效率
2. Optimizing deep learning inference on mobile devices with neural network accelerators [J] . Zeng Xi, Xu Yunlong, Zhi Tian 高技术通讯（英文版） . 2019,第004期

机译：使用神经网络加速器优化移动设备上的深度学习推理
3. Virtual Network Function Placement Optimization With Deep Reinforcement Learning [J] . Solozabal Ruben, Ceberio Josu, Sanchoyerto Aitor, IEEE Journal on Selected Areas in Communications . 2020,第2期

机译：虚拟网络功能放置优化与深增强学习
4. AVEC: Accelerator Virtualization in Cloud-Edge Computing for Deep Learning Libraries [C] . Jason Kennedy, Blesson Varghese, Carlos Reaño IEEE International Conference on Fog and Edge Computing . 2021

机译：AVEC：深度学习库的云边缘计算中的加速度虚拟化
5. Secure Deep Learning Accelerators [D] . Mera Collantes, Maria I. 2021

机译：安全深受学习加速器
6. Reply to Jue et al. Value of MRI to Improve Deep Learning Model That Identifies High-Grade Prostate Cancer. Comment on Gentile et al. Optimized Identification of High-Grade Prostate Cancer by Combining Different PSA Molecular Forms and PSA Density in a Deep Learning Model. Diagnostics 2021 11 335 [O] . Francesco Gentile, Matteo Ferro, Bartolomeo Della Ventura, 2021

机译：回复jue等人。 MRI的价值改善识别高档前列腺癌的深度学习模型。评论Gentile等人。通过在深层学习模型中结合不同PSA分子形式和PSA密度来优化高级前列腺癌的优化鉴定。诊断202111335
7. Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators [O] . Yu-Hsin Chen, Joel Emer, Vivienne Sze 2017

机译：使用DataFlow优化深神经网络加速器的能效

Optimizing the efficiency of deep learning through accelerator virtualization

摘要

著录项

相似文献

相关主题

期刊订阅