We study the problem of learning with incomplete information in a student–teacher setup for the committee machine. The learning algorithm combines unsupervised Hebbian learning of a series of associations with a delayed reinforcement step, in which the set of previously learnt associations is partly and indiscriminately unlearnt, to an extent that depends on the success rate of the student on these previously learnt associations. The relevant learning parameter λ represents the strength of Hebbian learning. A coarse-grained analysis of the system yields a set of differential equations for overlaps of student and teacher weight vectors, whose solutions provide a complete description of the learning behavior. It reveals complicated dynamics showing that perfect generalization can be obtained if the learning parameter exceeds a threshold λ c , and if the initial value of the overlap between student and teacher weights is non-zero. In case of convergence, the generalization error exhibits a power law decay as a function of the number of examples used in training, with an exponent that depends on the parameter λ. An investigation of the system flow in a subspace with broken permutation symmetry between hidden units reveals a bifurcation point λ* above which perfect generalization does not depend on initial conditions. Finally, we demonstrate that cases of a complexity mismatch between student and teacher are optimally resolved in the sense that an over-complex student can emulate a less complex teacher rule, while an under-complex student reaches a state which realizes the minimal generalization error compatible with the complexity mismatch.
展开▼
机译:我们在委员会机器的学生-老师设置中研究了信息不完整的学习问题。该学习算法将一系列关联的无监督Hebbian学习与延迟增强步骤相结合,其中一部分先前学习的关联被部分或不加选择地取消学习,其程度取决于学生在这些先前学习的关联上的成功率。相关的学习参数λ表示赫比学习的能力。系统的粗粒度分析产生了一组针对学生和教师权重向量重叠的微分方程,其解提供了学习行为的完整描述。它揭示了复杂的动力学,表明如果学习参数超过阈值λ c sub>,并且学生和教师权重之间的交叠的初始值不为零,则可以获得完美的概括。在收敛的情况下,泛化误差表现出幂定律衰减,该幂律随训练中使用的示例数而变,其指数取决于参数λ。对子单元中隐藏单元之间的排列对称性受损的子空间中的系统流进行研究后,发现了分叉点λ*,在该分叉点之上,完美的泛化不取决于初始条件。最后,我们证明,从过度复杂的学生可以模仿不太复杂的老师规则,而复杂程度不高的学生达到实现最小泛化误差兼容的状态的角度出发,可以很好地解决学生与老师之间的复杂性不匹配的情况与复杂性不匹配。
展开▼