首页>
外国专利>
TRAINING MACHINE LEARNING MODELS ON A LARGE-SCALE DISTRIBUTED SYSTEM USING A JOB SERVER
TRAINING MACHINE LEARNING MODELS ON A LARGE-SCALE DISTRIBUTED SYSTEM USING A JOB SERVER
展开▼
机译:使用作业服务器的大型分布式系统上的训练机学习模型
展开▼
页面导航
摘要
著录项
相似文献
摘要
A computer system for training machine learning models includes a job server and a plurality of compute nodes. The job server receives jobs for training machine learning models and allocates these training jobs to groups of one or more compute nodes. The allocation is based on the current requirements of the training jobs and the current status of the compute nodes. The training jobs include updating values for the parameters (e.g., weights and biases) of the machine learning models. Preferably, the compute nodes in the training group communicate the updated values of the parameters among themselves in order to complete the training job.
展开▼