Reinforcement learning with nonstationary reward depending on the episode

机译：根据情节进行非平稳奖励的强化学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A model which represents nonstationary reward is proposed for reinforcement learning(RL). RL is a framework that the agent learns by the interaction with an environment. The agent receives the reward, and learns its behavior. The reward is determined by the designer. It is not necessary to design the behavior of the agent so that RL is expected to be applied to various applications. However, conventional RL algorithms work under the assumption that the environment is stationary. In other words, conventional RL can not accept unstationary rewards and the change of the objective. From the point of view of real world applications, it is necessary for the agent to deal with a change of the objective. In this paper, a learning technique to deal with a temporal change of the reward is proposed. In the proposed reward representation, the reward is divided into two parts: episode-dependent part and episode-independent part. The simulation experiments show the effectiveness of the proposed method.

机译：提出了一种表示非平稳报酬的模型用于强化学习（RL）。 RL是代理通过与环境交互来学习的框架。代理获得奖励并了解其行为。奖励由设计者确定。不必设计代理的行为，这样就可以预期将RL应用到各种应用程序中。但是，常规的RL算法是在环境稳定的前提下工作的。换句话说，常规RL无法接受不稳定的奖励和目标的改变。从实际应用的角度来看，代理有必要应对目标的改变。在本文中，提出了一种处理奖励随时间变化的学习技术。在提出的奖励表示中，奖励分为两个部分：情节相关部分和情节独立部分。仿真实验表明了该方法的有效性。

著录项

来源
《2011 IEEE International Conference on Systems, Man, and Cybernetics》|2011年|p.2145-2150|共6页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化系统理论;
关键词
learning in environment with temporal change of reward; nonstationary reward; reinforcement learning;

机译：奖励随时间变化的环境中学习;非平稳性奖励;强化学习;
入库时间 2022-08-26 15:16:46

相似文献

外文文献
中文文献
专利

1. The left hemisphere learns what is right: Hemispatial reward learning depends on reinforcement learning processes in the contralateral hemisphere [J] . Aberg Kristoffer Carl, Doell Kimberly Crystal, Schwartz Sophie Neuropsychologia . 2016,第Null期

机译：左半球学习对的东西：半pat回奖励学习取决于对侧半球的强化学习过程
2. The left hemisphere learns what is right: Hemispatial reward learning depends on reinforcement learning processes in the contralateral hemisphere [J] . Aberg Kristoffer Carl, Doell Kimberly Crystal, Schwartz Sophie Neuropsychologia . 2016,第Null期

机译：左半球学习什么是正确的：半缺陷奖励学习取决于对侧半球的加强学习过程
3. Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance [J] . W. Bradley Knox, Peter Stone Artificial intelligence . 2015,第auga期

机译：从人的奖励中构筑强化学习：奖励积极性，暂时性打折，流行和表现
4. Reinforcement learning with nonstationary reward depending on the episode [C] . Takeshi SHIBUYA, Seiji YASUNOBU IEEE International Conference on Systems, Man and Cybernetics . 2011

机译：根据剧集的不间断奖励加固学习
5. Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation [D] . Agarwal, Nikhil. 2021

机译：使用分布式奖励制定学习基于模型的强化学习的政策
6. Inferring reward prediction errors in patients with schizophrenia: a dynamic reward task for reinforcement learning [O] . Chia-Tzu Li, Wen-Sung Lai, Chih-Min Liu, 2014

机译：推断精神分裂症患者的奖励预测错误：强化学习的动态奖励任务
7. The left hemisphere learns what is right: Hemispatial reward learning depends on reinforcement learning processes in the contralateral hemisphere [O] . Aberg, Carl Kristoffer, Doell, Kimberly, Schwartz, Sophie 2016

机译：左半球学习对的东西：半pat回奖励学习取决于对侧半球的强化学习过程
8. Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance. [R] . Knox, W. B., Stone, P. 2014

机译：从人类奖励中学习强化学习：奖励积极性，时间贴现，情节性和表现。

Reinforcement learning with nonstationary reward depending on the episode

摘要

著录项

相似文献

相关主题

期刊订阅