Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

Uchibe E; Doya K

首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

【24h】

Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

机译：通过具体的进化和受限的强化学习来找到内在的奖励。

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Understanding the design principle of reward functions is a substantial challenge both in artificial intelligence and neuroscience. Successful acquisition of a task usually requires not only rewards for goals, but also for intermediate states to promote effective exploration. This paper proposes a method for designing 'intrinsic' rewards of autonomous agents by combining constrained policy gradient reinforcement learning and embodied evolution. To validate the method, we use Cyber Rodent robots, in which collision avoidance, recharging from battery packs, and 'mating' by software reproduction are three major 'extrinsic' rewards. We show in hardware experiments that the robots can find appropriate 'intrinsic' rewards for the vision of battery packs and other robots to promote approach behaviors.

机译：理解奖励功能的设计原理是人工智能和神经科学领域的重大挑战。成功完成任务通常不仅需要奖励目标，还需要中间状态以促进有效探索。本文提出了一种结合约束策略梯度强化学习和体现进化来设计自治主体“内在”报酬的方法。为了验证该方法，我们使用Cyber Rodent机器人，其中避免碰撞，从电池组充电以及通过软件复制进行“配合”是三大“外在”奖励。我们在硬件实验中表明，机器人可以为电池组和其他机器人的视觉找到适当的“内在”奖励，以促进进近行为。

著录项

来源
《Neural Networks: The Official Journal of the International Neural Network Society 》 |2008年第10期| 共9页
作者
Uchibe E; Doya K;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类神经病学 ;
关键词
Reinforcement (Psychology); Learning; Evolution; Electrical battery; findings; 强化(心理学); 学习; 进化;

机译：Reinforcement (Psychology);Learning;Evolution;Electrical battery;findings;强化(心理学);学习;进化;

相似文献

外文文献
中文文献
专利

1. Finding intrinsic rewards by embodied evolution and constrained reinforcement learning. [J] . Uchibe E, Doya K Neural Networks: The Official Journal of the International Neural Network Society . 2008 ,第10期

机译：通过具体的进化和受限的强化学习来找到内在的奖励。
2. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. [J] . Glascher J, Daw N, Dayan P, Neuron . 2010 ,第4期

机译：状态与回报：基于模型和无模型的强化学习背后的可分离的神经预测错误信号。
3. Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. [J] . Haruno M, Kawato M Neural Networks: The Official Journal of the International Neural Network Society . 2006 ,第8期

机译：整合多个皮层-纹状体环的分层强化学习模型：功能磁共振成像检查在刺激-行动-奖励关联学习中的应用。
4. Finding Exploratory Rewards by Embodied Evolution and Constrained Reinforcement Learning in the Cyber Rodents [C] . Eiji Uchibe, Kenji Doya International Conference on Neural Information Processing;ICONIP 2007 . 2008

机译：通过网络啮齿动物的典型进化和约束强化学习找到探索性奖励
5. Pain-Inspired Intrinsic Reward For Deep Reinforcement Learning [D] . Richardson, Trevor Woods 2018

机译：痛苦启发的深度强化学习的内在奖励
6. Frequency of reinforcement as a determinant of extinction-induced aggression during errorless discrimination learning. [O] . M Rilling, H J Caplan 1975

机译：强化的频率作为无误判别学习过程中灭绝诱发的攻击行为的决定因素。
7. Reward function and initial values : Better choices for accelerated Goal-directed reinforcement learning. [O] . Matignon, Laëtitia, Laurent, Guillaume,, Le Fort - Piat, Nadine 2006

机译：奖励功能和初始值：加速目标导向的强化学习的更好选择。
8. Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance. [R] . Knox, W. B., Stone, P. 2014

机译：从人类奖励中学习强化学习：奖励积极性，时间贴现，情节性和表现。

Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

摘要

著录项

相似文献

相关主题

期刊订阅