Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback

Tan A.-H.; Lu N.; Xiao D.

首页> 外文期刊>IEEE Transactions on Neural Networks >Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback

【24h】

Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback

机译：整合时差方法和自组织神经网络用于延迟评估反馈的强化学习

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents a neural architecture for learning category nodes encoding mappings across multimodal patterns involving sensory inputs, actions, and rewards. By integrating adaptive resonance theory (ART) and temporal difference (TD) methods, the proposed neural model, called TD fusion architecture for learning, cognition, and navigation (TD-FALCON), enables an autonomous agent to adapt and function in a dynamic environment with immediate as well as delayed evaluative feedback (reinforcement) signals. TD-FALCON learns the value functions of the state–action space estimated through on-policy and off-policy TD learning methods, specifically state–action–reward–state–action (SARSA) and Q-learning. The learned value functions are then used to determine the optimal actions based on an action selection policy. We have developed TD-FALCON systems using various TD learning strategies and compared their performance in terms of task completion, learning speed, as well as time and space efficiency. Experiments based on a minefield navigation task have shown that TD-FALCON systems are able to learn effectively with both immediate and delayed reinforcement and achieve a stable performance in a pace much faster than those of standard gradient–descent-based reinforcement learning systems.

机译：本文提出了一种用于学习类别节点的神经架构，该类别节点编码涉及感官输入，动作和奖励的多模式模式之间的映射。通过整合自适应共振理论（ART）和时差（TD）方法，提出的神经模型，称为用于学习，认知和导航的TD融合体系结构（TD-FALCON），使自治代理能够在动态环境中适应和运行立即和延迟的评估反馈（强化）信号。 TD-FALCON通过策略上和策略外的TD学习方法来学习状态-行动空间的价值函数，特别是状态-行动-奖励-状态-行动（SARSA）和Q学习。然后，将学习值函数用于基于操作选择策略确定最佳操作。我们使用各种TD学习策略开发了TD-FALCON系统，并比较了它们在任务完成，学习速度以及时间和空间效率方面的表现。基于雷场导航任务的实验表明，TD-FALCON系统能够立即和延迟地进行有效学习，并且比基于标准梯度下降的强化学习系统要快得多。

著录项

来源
《IEEE Transactions on Neural Networks》 |2008年第2期|p.230-244|共15页
作者
Tan A.-H.; Lu N.; Xiao D.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
Reinforcement learning; self-organizing neural networks (NNs); temporal difference (TD) methods;

机译：强化学习;自组织神经网络（NN）;时差（TD）方法;

相似文献

外文文献
中文文献
专利

1. Self-Organizing Neural Networks Integrating Domain Knowledge and Reinforcement Learning [J] . Teng Teck-Hou, Tan Ah-Hwee, Zurada Jacek M. Neural Networks and Learning Systems, IEEE Transactions on . 2015,第5期

机译：整合领域知识和强化学习的自组织神经网络
2. Spiking neural network reinforcement learning method based on temporal coding and STDP [J] . Alexander Sboev, Danila Vlasov, Roman Rybka, Procedia Computer Science . 2018,第5期

机译：基于时间编码和STDP的尖峰神经网络加固学习方法
3. A Building Energy Consumption Prediction Method Based on Integration of a Deep Neural Network and Transfer Reinforcement Learning [J] . Fu Qiming, Liu QingSong, Gao Zhen, International Journal of Pattern Recognition and Artificial Intelligence . 2020,第10期

机译：基于深度神经网络和转移加固学习集成的建筑能耗预测方法
4. Integrating self-organizing neural network and Motivated Learning for coordinated multi-agent reinforcement learning in multi-stage stochastic game [C] . Teng Teck-Hou, Tan Ah-Hwee, Starzyk Janusz A., International Joint Conference on Neural Networks . 2014

机译：将自组织神经网络与动机学习相结合，以进行多阶段随机博弈的协同多主体强化学习
5. Multistability in neural networks with delayed feedback: Theory and applications. [D] . Ma, Jianfu. 2008

机译：具有延迟反馈的神经网络中的多重稳定性：理论与应用。
6. Learning from delayed feedback: neural responses in temporal credit assignment [O] . Matthew M. Walsh, John R. Anderson -1

机译：从延迟反馈中学习：时间信用分配中的神经响应
7. Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning with Delayed Evaluative Feedback [O] . Ah-hwee Tan, Senior Member, Ning Lu, 2008

机译：时间差分方法与自组织神经网络相结合，用于延迟评价反馈的强化学习
8. Predictive Regulation of Associative Learning in a Neural Network by Reinforcement and Attentive Feedback [R] . Grossberg, S., Levine, D., Schmajuk, N. 1987

机译：通过强化和注意反馈对神经网络中联想学习的预测调节

Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅