首页> 外文会议>Brazilian Symposium on Neural Networks >Speeding up autonomous learning by using state-independent option policies ant termination improvement

【24h】

Speeding up autonomous learning by using state-independent option policies ant termination improvement

机译：通过使用国家无关的选项策略蚂蚁终止改进来加速自主学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In Reinforcement Learning applications such as autonomous robot navigation, the use of options (macro-operators) instead of low level actions has been reported to produce learning speedup due to a more agressive exploration of the state space. In this paper we present an evaluation of the use of option policies O{sub}S. Each option policy in this framework is a fixed sequence of actions, depending exclusively on the state in which the option is initiated. This contrasts with option policies O{sub}∏, more common in the literature and that correspond to action sequences that depend on the states visited during the execution of the options. One of our goals was to analyse the effects of a variation of the action sequence length for O{sub}S policies. The main contribution of the paper, however, is a study on the use of a Termination Improvement technique which allows for the abortion of option execution if a more promissing one is found. Experimental results show that Termination Improvement for O{sub}S options, whose benefits had already been reported for O{sub}∏ options, can be much more effective - due to its adaptation of the size of the action sequence depending on the state where the option is initiated - than indiscriminately augmenting the option size in order to increase exploration of the state space.

机译：在强化学习的应用，如自主机器人导航，使用的选项（宏观运营商），而不是低层次的行动已报产生学习加速由于状态空间的更积极的探索。在本文中，我们目前使用的选项政策-O {}子S的的评估。在这个框架中的每个选项的政策行动，专门在该选项启动的状态取决于固定顺序。这与选择政策-O {}子Π，多见于文学和对应于依赖的选项执行期间参观了美国的行动序列。我们的目标之一是分析动作序列长度为-O {}子的政策变化的影响。本文的主要贡献，但是，在使用终止改进技术，它允许选项执行的堕胎是一种更promissing一个被发现的研究。视的状态下，由于其动作顺序的大小适应 - 实验结果显示Ø{副} s ^期权，其收益已经报道-O {副}Π选项，终止改进，可以更有效选项开始 - 不是为了增加国家的空间探测胡乱扩充选项大小。

著录项

来源
《Brazilian Symposium on Neural Networks》|2002年||共6页
会议地点
作者
Leticia Maria Friske; Carlos Henrique Costa Ribeiro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP183-53;
关键词

相似文献

外文文献
中文文献
专利

1. Can autonomous vehicles enable sustainable mobility in future cities? Insights and policy challenges from user preferences over different urban transport options [J] . Acheampong Ransford A., Cugurullo Federico, Gueriau Maxime, Cities . 2021,第May期

机译：自主车辆可以在未来的城市实现可持续的移动性吗？不同城市交通选择的用户偏好的见解与政策挑战
2. Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement [J] . Ali Ezzeddine, Nafee Mourad, Babak Nadjar Araabi, Expert Systems with Application . 2018,第DECa期

机译：通过逆向强化学习和贝叶斯政策改进，结合非最佳演示和反馈中的学习
3. Service skill improvement for home robots: Autonomous generation of action sequence based on reinforcement learning [J] . Zhang Mengyang, Tian Guohui, Zhang Ying, Knowledge-Based Systems . 2021,第Jana5期

机译：家居机器人的服务技能改进：基于强化学习的自主代动作序列
4. Speeding up autonomous learning by using state-independent option policies ant termination improvement [C] . Leticia Maria Friske, Carlos Henrique Costa Ribeiro Brazilian Symposium on Neural Networks . 2002

机译：通过使用国家无关的选项策略蚂蚁终止改进来加速自主学习
5. Speeding Up Trajectory Planning for Autonomous Robots Operating in Complex Environments [D] . ?Rajendran, Pradeep 2019

机译：加快轨迹规划自主机器人工作在复杂的环境
6. Policy-Gradient and Actor-Critic Based State Representation Learning for Safe Driving of Autonomous Vehicles [O] . Abhishek Gupta, Ahmed Shaharyar Khwaja, Alagan Anpalagan, 2020

机译：基于政策梯度和演员批评的国家代表性学习自主车辆安全驾驶
7. On-the-job improvements in teacher competence : policy options and their effects on teaching and learning in Thailand [O] . Raudenbush Stephen W., Eamsukkawat Suwanna, Di-Ibor Ikechuku, 100

机译：教师能力的在职改进：政策选择及其对泰国教学的影响
8. Tax Policy: Options for Speeding Tax Refunds and Reducing IRS' Interest Costs [R] . 1986

机译：税收政策：加速退税和降低国税局利息成本的选择

Speeding up autonomous learning by using state-independent option policies ant termination improvement

摘要

著录项

相似文献

相关主题

期刊订阅