A system for distributed training of a machine learning model over a plurality of computing nodes, comprising a server connected to a plurality of computing nodes and configured to control a training of a machine learning model in a plurality of training iterations. Each of the training iterations comprising: instructing each of the computing nodes to train a respective local copy of the machine learning model by locally computing a respective one of a plurality of cumulative gradients each including one or more gradients, obtaining the cumulative gradients from each of the computing nodes and creating an updated machine learning model by merging the machine learning model with an aggregated value of the cumulative gradients. Wherein during the obtaining and creating phases, one or more of the computing nodes compute a new respective cumulative gradient that is merged with the machine learning model in a following training iteration.
展开▼