【24h】

Large Scale Distributed Deep Networks

机译:大规模分布式深度网络

获取原文
获取外文期刊封面目录资料

摘要

Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.
机译:无监督特征学习和深度学习的最新工作表明,能够训练大型模型可以显着提高性能。在本文中,我们考虑使用成千上万个CPU内核训练具有数十亿个参数的深度网络的问题。我们已经开发了一种称为DistBelief的软件框架,该框架可以利用具有数千台机器的计算集群来训练大型模型。在此框架内,我们开发了两种用于大规模分布式训练的算法:(i)Downpour SGD,一种支持大量模型副本的异步随机梯度下降过程,以及(ii)Sandblaster,一种支持多种分布式模型的框架批处理优化程序,包括L-BFGS的分布式实现。倾盆大雨SGD和Sandblaster L-BFGS都增加了深度网络训练的规模和速度。我们已经成功地使用我们的系统训练了一个深度网络,其深度比文献中先前报道的大30倍,并在ImageNet上实现了最先进的性能,ImageNet是一种具有1600万幅图像和21k个类别的视觉对象识别任务。我们证明了这些相同的技术可以极大地加速训练用于商业语音识别服务的规模较小的深度网络。尽管我们专注于并报告了用于训练大型神经网络的这些方法的性能,但其基础算法适用于任何基于梯度的机器学习算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号