Systems and methods for training a residual neural network are described. Oneof the methodsincludes arranging a plurality of residual units into one or more subsetscorresponding to a pluralityof warped layers; configuring a parallelizable warp operator to compute anoutput of a warpedlayer from an input to the warped layer using a first-order Taylor seriesapproximation; anddetermining a final parameter setting for a plurality of parameters of theresidual neural networkby training the residual neural network on a training set, wherein trainingthe residual networkcomprises applying the parallelizable warp operator to each of the pluralityof warped layers.
展开▼