首页> 外文期刊>IBM Journal of Research and Development >Optimizing the efficiency of deep learning through accelerator virtualization
【24h】

Optimizing the efficiency of deep learning through accelerator virtualization

机译:通过加速器虚拟化优化深度学习的效率

获取原文
获取原文并翻译 | 示例
           

摘要

Training deep learning models often occupies entire compute clusters, built solely for this purpose, for days or even weeks at a time. There exists a large body of work on approaches for improving training performance, ranging from novel algorithms to full custom hardware accelerators. Offering compute capabilities of multiple teraflops (trillion floating point operations per second), graphics processing units (GPUs) have established themselves as a de-facto standard for accelerating deep learning network training. As systems with up to 16 GPUs—each GPU consuming up to 300 W—become available, efficient usage of these resources becomes imperative. We conduct a detailed analysis of deep learning workloads to characterize their efficiency in making use of GPU acceleration. We found that many deep learning workloads consume only a fraction of GPU resources, and we demonstrate how sharing GPU resources can improve throughput by a factor of 3, effectively turning a 4-GPU commodity cloud system into a high-end 12-GPU supercomputer. Using Watson workloads from three major areas that incorporate deep learning technology—i.e., language classification, visual recognition, and speech recognition—we document the effectiveness and scalability of our approach. We are working toward enabling GPU virtualization not only to reduce cost, but also to accelerate new breakthroughs in deep learning by increasing compute capacity without making further hardware investments.
机译:训练深度学习模型通常占用整个计算集群,而这些集群是专门为此目的而构建的,一次要花费几天甚至几周的时间。从新颖的算法到完全自定义的硬件加速器,都有大量的方法用于提高训练性能。图形处理单元(GPU)提供多个万亿次运算能力(每秒数万亿个浮点运算),已将它们确立为事实上的标准,用于加速深度学习网络培训。随着具有多达16个GPU(每个GPU消耗300 W)的系统变得可用,有效利用这些资源变得势在必行。我们对深度学习工作负载进行了详细分析,以表征其利用GPU加速的效率。我们发现许多深度学习工作负载仅消耗GPU资源的一小部分,并且展示了共享GPU资源如何将吞吐量提高3倍,从而有效地将4-GPU商品云系统转变为高端12-GPU超级计算机。使用结合了深度学习技术的三个主要领域的Watson工作负载-语言分类,视觉识别和语音识别-我们记录了这种方法的有效性和可扩展性。我们正在努力实现GPU虚拟化,不仅可以降低成本,而且可以通过增加计算能力而无需进行进一步的硬件投资来加速深度学习的新突破。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号