...
【24h】

Learning with incomplete information and the mathematical structure behind it

机译:通过不完整的信息及其背后的数学结构进行学习

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We investigate the problem of learning with incomplete information as exemplified by learning with delayed reinforcement. We study a two phase learning scenario in which a phase of Hebbian associative learning based on momentary internal representations is supplemented by an 'unlearning' phase depending on a graded reinforcement signal. The reinforcement signal quantifies the success-rate globally for a number of learning steps in phase one, and 'unlearning' is indiscriminate with respect to associations learnt in that phase. Learning according to this model is studied via simulations and analytically within a student-teacher scenario for both single layer networks and, for a committee machine. Success and speed of learning depend on the ratio lambda of the learning rates used for the associative Hebbian learning phase and for the unlearning-correction in response to the reinforcement signal, respectively. Asymptotically perfect generalization is possible only, if this ratio exceeds a critical value lambda (c) , in which case the generalization error exhibits a power law decay with the number of examples seen by the student, with an exponent that depends in a non-universal manner on the parameter lambda. We find these features to be robust against a wide spectrum of modifications of microscopic modelling details. Two illustrative applications-one of a robot learning to navigate a field containing obstacles, and the problem of identifying a specific component in a collection of stimuli-are also provided.
机译:我们调查了信息不完整的学习问题,例如延迟强化学习。我们研究了一个两阶段的学习场景,其中基于瞬时内部表示的希伯来语联想学习阶段由取决于分级增强信号的“非学习”阶段补充。增强信号在第一阶段中的多个学习步骤中全局量化了成功率,并且相对于在该阶段中学习到的联想而言,“非学习”是不加区别的。在单层网络和委员会机器的学生-教师场景中,通过模拟和分析来研究根据该模型进行的学习。学习的成功和速度分别取决于用于联想Hebbian学习阶段和用于响应增强信号的取消学习校正的学习率的比率lambda。仅当该比率超过临界值lambda(c)时,才可能出现渐近完美的泛化,在这种情况下,泛化误差的幂定律随学生看到的示例数的变化而衰减,且指数取决于非普遍性参数lambda的方式。我们发现这些功能对于微观建模细节的广泛修改具有鲁棒性。还提供了两个示例性应用程序,一种是机器人学习如何导航包含障碍物的领域,另一种是确定刺激集合中特定成分的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号