首页> 外国专利> JOB SCHEDULING METHOD FOR DISTRIBUTED DEEP LEARNING OVER A SHARED GPU CLUSTER AND COMPUTER-READABLE RECORDING MEDIUM

JOB SCHEDULING METHOD FOR DISTRIBUTED DEEP LEARNING OVER A SHARED GPU CLUSTER AND COMPUTER-READABLE RECORDING MEDIUM

机译:在共享GPU集群和计算机可读记录介质上分布式深度学习的作业调度方法

摘要

The task scheduling method according to the present invention is a task scheduling method for a shared GPU cluster for learning a deep learning model, a determination step of determining the GPU quota for a plurality of tasks, a case of adding one GPU for a plurality of tasks An estimation step of estimating the learning rate, an extraction step of extracting a job with the largest increase in speedup based on each estimated learning rate for a plurality of jobs, an allocation step of adding one GPU quota to the extracted jobs, and an iterative step of sequentially repeating the estimating step, the extracting step, and the allocating step until at least one GPU is allocated for all of the plurality of tasks, wherein the speedup is (learning rate when using one GPU)/(GPU learning rate upon additional assignment). According to the distributed deep learning task scheduling method for a shared GPU cluster according to the present invention and a computer-readable recording medium recording the same, the GPU cluster is efficiently distributed by distributing it to utilize the entire GPU as efficiently as possible based on the improvement of the learning speed of the deep learning model. can be managed with In particular, when scheduling a plurality of tasks for training a deep learning model on a GPU cluster at the same time, it has the technical effect of minimizing the average task completion time and all overall completion times.
机译:根据本发明的任务调度方法是用于学习深度学习模型的共享GPU集群的任务调度方法,确定用于多个任务的GPU配额的确定步骤,用于为多个添加一个GPU的情况任务估计学习速率的估计步骤,基于多个作业的每个估计的学习率,提取具有最大加速度的作业的提取步骤,将一个GPU配额添加到提取的作业的分配步骤,以及顺序重复估计步骤的迭代步骤,提取步骤和分配步骤,直到为所有多个任务分配至少一个GPU,其中加速是(使用一个GPU时的学习率)/(GPU学习率附加分配)。根据根据本发明的共享GPU集群的分布式深度学习任务调度方法和计算机可读记录介质相同,通过将其分配以利用基于的尽可能有效地利用整个GPU来有效地分发GPU集群。深层学习模型学习速度的提高。可以特别地管理,当调度用于在GPU集群上训练的多个任务时,同时,它具有最小化平均任务完成时间和所有整体完成时间的技术效果。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号