首页> 外文会议>EUROCON 2009, EUROCON '09 >Improving reinforcement learning using temporal-difference network EUROCON2009

【24h】

Improving reinforcement learning using temporal-difference network EUROCON2009

机译：使用时差网络EUROCON2009改进强化学习

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Reinforcement learning has been one of popular learning methods for many problems in many different domains. The important point for this method is how fast and efficient it is to learn a new problem. In this paper, we present a new approach to increase the efficiency of the reinforcement learning method with the great help of a predictive model of the problem's environment called temporal-difference network along with observation. This TD network is nourished with the knowledge extracted from another problem with the same task using TD network. First a reinforcement-learning agent tries to learn its environment for the task of wall following. After that we train temporal-difference network (TDN) with intervening observation in the brain of the agent in order to gain a predictive model of the environment. Later the most promising sequences of action-observation of the given environment will be extracted as knowledge to strengthen the reinforcement learning problem in a new environment. Finally this knowledge helps the reinforcement procedure to produce more efficient results.

机译：强化学习已经成为许多不同领域中许多问题的流行学习方法之一。这种方法的重点是学习新问题的速度和效率。在本文中，我们提出了一种新的方法来提高强化学习方法的效率，这是在问题环境的预测模型（称为时差网络和观察）的大力帮助下实现的。使用TD网络从具有相同任务的另一个问题中提取的知识来滋养此TD网络。首先，强化学习代理尝试学习其环境以完成墙面跟踪任务。之后，我们在中介人的大脑中进行中间观察训练时差网络（TDN），以获得环境的预测模型。稍后，将提取给定环境中最有希望的行动观察序列，作为在新环境中加强强化学习问题的知识。最后，这些知识有助于加固程序产生更有效的结果。

著录项

来源
《EUROCON 2009, EUROCON '09》|2009年|1716-1722|共7页
会议地点 Saint Petersburg(RU);Saint Petersburg(RU)
作者
Karbasian H.; Ahmadabadi M.N.; Araabi B.N.;
展开▼
作者单位

Control Intell. Process. Center of Excellence, Univ. of Tehran, Tehran, Iran;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Markov processes; decision theory; intelligent robots; knowledge acquisition; learning (artificial intelligence); learning systems; mobile robots; predictive control; POMDP; TD network; knowledge extraction; partially observable Markov decision process; predictive model; reinforcement learning agent method; robot wall-following; temporal-difference network; Concept; MDP; Reinforcement Learning;

机译：马尔可夫过程;决策理论;智能机器人;知识获取;学习（人工智能）;学习系统;移动机器人;预测控制; POMDP; TD网络;知识提取;部分可观察的马尔可夫决策过程;预测模型;强化学习代理方法;机器人围墙跟进;时差网络;概念; MDP;强化学习;

相似文献

外文文献
中文文献
专利

1. Correlation minimizing replay memory in temporal-difference reinforcement learning [J] . Ramicic Mirza, Bonarinib Andrea Neurocomputing . 2020,第Juna14期

机译：在时间差异增强学习中最小化重播内存的相关性
2. Optimal Bidding and Operation of a Power Plant with Solvent-Based Carbon Capture under a CO2 Allowance Market: A Solution with a Reinforcement Learning-Based Sarsa Temporal-Difference Algorithm [J] . Ziang Li, Zhengtao Ding, Meihong Wang 工程（英文） . 2017,第002期

机译：CO2允许市场下具有溶剂基碳捕集的电厂的最优竞价和运营：基于强化学习的Sarsa时差算法的解决方案
3. Improving primary frequency response in networked microgrid operations using multilayer perceptron-driven reinforcement learning [J] . Radhakrishnan Nikitha, Chakraborty Indrasis, Xie Jing, IET Smart Grid . 2020,第4期

机译：利用多层的感知驱动增强学习改善网络微电网运行中的初级频率响应
4. IMPROVING REINFORCEMENT LEARNING USING TEMPORAL-DIFFERENCE NETWORK EUROCON2009 [C] . Habib Karbasian, Majid N. Ahmadabadi, Babak N. Araabi International Conference Devoted to the Anniversary of Alexander Popov . 2009

机译：使用时间差异网络欧元兑2009改善钢筋学习
5. Using reinforcement learning to improve network durability. [D] . Hammel, Erik. 2013

机译：使用强化学习来提高网络持久性。
6. Temporal-Difference Reinforcement Learning with Distributed Representations [O] . Zeb Kurth-Nelson, A. David Redish 2009

机译：分布式表示的时差强化学习
7. Correlation minimizing replay memory in temporal-difference reinforcement learning [O] . Mirza Ramicic, Andrea Bonarini 2020

机译：在时间差异增强学习中最小化重播内存的相关性

Improving reinforcement learning using temporal-difference network EUROCON2009

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅