首页>
外国专利>
SYSTEM AND METHOD FOR DYNAMIC SCHEDULING OF DISTRIBUTED DEEP LEARNING TRAINING JOBS
SYSTEM AND METHOD FOR DYNAMIC SCHEDULING OF DISTRIBUTED DEEP LEARNING TRAINING JOBS
展开▼
机译:分布式深层学习训练作业动态调度的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A scheduling algorithm for scheduling training of deep neural network (DNN) weights on processing units identifies a next job to provisionally assign a processing unit (PU) based on a doubling heuristic. The doubling heuristic makes use of an estimated number of training sets needed to complete training of weights for a given job and/or a training speed function which indicates how fast the weights are converging. The scheduling algorithm solves a problem of efficiently assigning PUs when multiple DNN weight data structures must be trained efficiently. In some embodiments, the training of the weights uses a ring-based message passing architecture. In some embodiments, performance using a nested loop approach or nested loop fashion is provided. In inner iterations of the nested loop, PUs are scheduled and jobs are launched or re-started. In outer iterations of the nested loop, jobs are stopped, parameters are updated and the inner iteration is re-entered.
展开▼