首页> 外文会议>International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery >Deep Learning Training Management Platform Based on Distributed Technologies in Resource-Constrained Scenarios
【24h】

Deep Learning Training Management Platform Based on Distributed Technologies in Resource-Constrained Scenarios

机译:基于资源约束方案的分布式技术的深度学习培训管理平台

获取原文

摘要

Deep learning has attracted a lot of research attention in the past few years for its efficiency and accuracy. However, there exist two problems of their study. Firstly, the computing power of one single machine is limited and not suitable for handling with training deep learning models with massive cells. Secondly, it costs much to train models on different deep learning frameworks. Motivated by these problems, this paper proposed a deep learning training management platform based on distributed technologies, which integrates different kinds of deep learning frameworks through virtualization technologies and coordinates machines through distributed technologies. Specially, specific algorithms are proposed to solve the multi-task scheduling problem, the computing resources allocation problem and the fault tolerance problem in resource limited scenarios. It turns out that the platform can be widely used in small and medium-sized research teams.
机译:深度学习在过去几年中吸引了很多研究关注,以获得其效率和准确性。但是,他们的研究有两个问题。首先,一台机器的计算能力是有限的,不适合使用具有大规模电池的训练深层学习模型。其次,在不同深入学习框架上培训模型需要很多成本。这些问题的动机,本文提出了一种基于分布式技术的深度学习培训管理平台,通过虚拟化技术整合不同类型的深度学习框架,通过分布式技术协调机器。特别地,提出了特定的算法来解决资源有限场景中的计算资源分配问题,计算资源分配问题和容错问题。事实证明,该平台可广泛用于中小型研究团队。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号