We revisit the oft-studied asymptotic (in sample size) behavior of the parameter or weight estimate returned by any member of a large fam- ily of neural network training algorithms. By properly accounting for the characteristic property of neural networks that their empirical and generalization errors possess multiple minima, we rigorously establish conditions under which the parameter estimate converges strongly into the set of minima of the generalization error.
展开▼