首页> 外文期刊>IEEE Transactions on Neural Networks >Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback
【24h】

Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback

机译:整合时差方法和自组织神经网络用于延迟评估反馈的强化学习

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a neural architecture for learning category nodes encoding mappings across multimodal patterns involving sensory inputs, actions, and rewards. By integrating adaptive resonance theory (ART) and temporal difference (TD) methods, the proposed neural model, called TD fusion architecture for learning, cognition, and navigation (TD-FALCON), enables an autonomous agent to adapt and function in a dynamic environment with immediate as well as delayed evaluative feedback (reinforcement) signals. TD-FALCON learns the value functions of the state–action space estimated through on-policy and off-policy TD learning methods, specifically state–action–reward–state–action (SARSA) and Q-learning. The learned value functions are then used to determine the optimal actions based on an action selection policy. We have developed TD-FALCON systems using various TD learning strategies and compared their performance in terms of task completion, learning speed, as well as time and space efficiency. Experiments based on a minefield navigation task have shown that TD-FALCON systems are able to learn effectively with both immediate and delayed reinforcement and achieve a stable performance in a pace much faster than those of standard gradient–descent-based reinforcement learning systems.
机译:本文提出了一种用于学习类别节点的神经架构,该类别节点编码涉及感官输入,动作和奖励的多模式模式之间的映射。通过整合自适应共振理论(ART)和时差(TD)方法,提出的神经模型,称为用于学习,认知和导航的TD融合体系结构(TD-FALCON),使自治代理能够在动态环境中适应和运行立即和延迟的评估反馈(强化)信号。 TD-FALCON通过策略上和策略外的TD学习方法来学习状态-行动空间的价值函数,特别是状态-行动-奖励-状态-行动(SARSA)和Q学习。然后,将学习值函数用于基于操作选择策略确定最佳操作。我们使用各种TD学习策略开发了TD-FALCON系统,并比较了它们在任务完成,学习速度以及时间和空间效率方面的表现。基于雷场导航任务的实验表明,TD-FALCON系统能够立即和延迟地进行有效学习,并且比基于标准梯度下降的强化学习系统要快得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号