Nexus: Bringing Efficient and Scalable Training to Deep Learning Frameworks

机译：Nexus：为深度学习框架带来有效和可扩展的培训

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Demand is mounting in the industry for scalable GPU-based deep learning systems. Unfortunately, existing training applications built atop popular deep learning frameworks, including Caffe, Theano, and Torch, etc, are incapable of conducting distributed GPU training over large-scale clusters. To remedy such a situation, this paper presents Nexus, a platform that allows existing deep learning frameworks to easily scale out to multiple machines without sacrificing model accuracy. Nexus leverages recently proposed distributed parameter management architecture to orchestrate distributed training by a large number of learners spread across the cluster. Through characterizing the run-time behavior of existing single-node based applications, Nexus is equipped with a suite of optimization schemes, including hierarchical and hybrid parameter aggregation, enhanced network and computation layer, and quality-guided communication adjustment, etc, to strengthen the communication channels and resource utilization. Empirical evaluations with a diverse set of deep learning applications demonstrate that Nexus is easy to integrate and can deliver efficient distributed training services to major deep learning frameworks. In addition, Nexus's optimization schemes are highly effective to shorten the training time with targeted accuracy bounds.

机译：需求在行业中安装了可扩展的基于GPU的深度学习系统。遗憾的是，现有的培训应用程序建立了热门的深度学习框架，包括Caffe，Theano和火炬等，无法通过大规模集群进行分布式GPU训练。为了解决这种情况，本文介绍了Nexus，一个允许现有深度学习框架的平台，以便在不牺牲模型精度的情况下轻松扩展到多台机器。 Nexus借助最近提出的分布式参数管理架构，通过大量的学习者分布在群集中来协调分布式培训。通过表征现有的基于单节点的应用程序的运行时行为，Nexus配备了一套优化方案，包括分层和混合参数聚合，增强的网络和计算层，以及质量引导的通信调整等，以增强通信频道和资源利用率。具有多种深度学习应用的经验评估表明，Nexus易于集成，可以为主要的深度学习框架提供高效的分布式培训服务。此外，Nexus的优化方案非常有效地缩短具有目标精度范围的培训时间。

著录项

来源
《IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems》|2017年|264p|共10页
会议地点
作者
Yandong Wang; Li Zhang; Yufei Ren; Wei Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Training; Machine learning; Graphics processing units; Neural networks; Computational modeling; Optimization; Feature extraction;

机译：培训;机器学习;图形处理单元;神经网络;计算建模;优化;特征提取;

相似文献

外文文献
中文文献
专利

1. Efficient Training Management for Mobile Crowd-Machine Learning: A Deep Reinforcement Learning Approach [J] . Tran The Anh, Nguyen Cong Luong, Niyato Dusit, Wireless Communications Letters, IEEE . 2019,第5期

机译：用于移动人群机器学习的有效培训管理：一种深度强化学习方法
2. Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting [J] . Cui Zhiyong, Henrickson Kristian, Ke Ruimin, IEEE Transactions on Intelligent Transportation Systems . 2020,第11期

机译：交通图卷积经常性神经网络：网络规模交通学习和预测的深度学习框架
3. Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey [J] . Nguyen Giang, Dlugolinsky Stefan, Bobak Martin, Artificial Intelligence Review: An International Science and Engineering Journal . 2019,第1期

机译：机器学习和深度学习框架和库的大型数据挖掘：调查
4. Nexus: Bringing Efficient and Scalable Training to Deep Learning Frameworks [C] . Yandong Wang, Li Zhang, Yufei Ren, 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems . 2017

机译：Nexus：将高效且可扩展的培训引入深度学习框架
5. Emerging Opportunities in Machine Learning Hardware Acceleration: From Advanced Neural Networks Implementation to Ultra-efficient Deep Learning Framework Using Next Generation Technology [D] . ?Cai, Ruizhe 2020

机译：机器学习硬件加速的新兴机会：从先进的神经网络实现，使用下一代技术实现超高效的深度学习框架
6. EDLMFC: an ensemble deep learning framework with multi-scale features combination for ncRNA–protein interaction prediction [O] . Jingjing Wang, Yanpeng Zhao, Weikang Gong, 2021

机译：EDLMFC：具有用于NCRNA蛋白质相互作用预测的多尺度特征组合的集合深度学习框架
7. Deep-Edge: An Efficient Framework for Deep Learning Model Update on Heterogeneous Edge [O] . Anirban Bhattacharjee, Ajay Dev Chhokra, Hongyang Sun, 2020

机译：深度：异构边缘的深度学习模型更新的高效框架

Nexus: Bringing Efficient and Scalable Training to Deep Learning Frameworks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅