The permutation symmetry of the hidden units in multilayer perceptrons causes the saddle structure and plateaus of the learning dynamics in gradient learning methods. The correlation of the weight vectors in the teacher network is supposed to affect this saddle structure resulting in the prolonged learning time, but this mechanism is still unclear. In this paper, we discuss it with regard to the soft committee machines and the on-line learning using statistical mechanics. Conventional steepest gradient descent needs longer time depending on the correlation of the weight vectors. On the other hand, natural gradient descent has no plateaus in the limit of the small learning rate even though the weight vectors have the strong correlation, which worsen the singularity of the Fisher information matrix. Analytical results supports these dynamics around the saddle point.
展开▼