首页> 美国政府科技报告 >Learning State Features from Policies to Bias Exploration in Reinforcement Learning

【24h】

Learning State Features from Policies to Bias Exploration in Reinforcement Learning

机译：学习国家特色从政策到强化学习中的偏见探索

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

When given several problems to solve in some domain, a standard reinforcement learner learns an optimal policy from scratch for each problem. If the domain has particular characteristics that are goal and problem independent, the learner might be able to take advantage of previously solved problems. Unfortunately, it is generally infeasible to directly apply a learned policy to new problems. This paper presents a method to bias exploration through previous problem solutions, which is shown to speed up learning on new problems. We first allow a Q-learner to learn the optimal policies for several problems. We describe each state in terms of local features, assuming that these state features together with the learned policies can be used to abstract out the domain characteristics from the specific layout of states and rewards in a particular problem. We then use a classifier to learn this abstraction by using training examples extracted from each learned Q-table. The trained classifier maps state features to the potentially goal independent successful actions in the domain. Given a new problem, we include the output of the classifier as an exploration bias to improve the rate of convergence of the reinforcement learner. We have validated our approach empirically. In this paper, we report results within the complex domain Sokoban which we introduce.

著录项

作者
Singer, B. ; Veloso, M.;
展开▼
作者单位

展开▼
年度 1999
页码 1-19
总页数 19
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Learning machines; Artificial intelligence; Algorithms; Problem solving; Decision theory; Markov processes;

机译：学习机器;人工智能;算法;问题解决;决策理论;马尔可夫过程;

相似文献

外文文献
中文文献
专利

1. Reinforcement learning based on local state feature learning and policy adjustment [J] . Lin YP., Li XY. Information Sciences: An International Journal . 2003,第1a2期

机译：基于局部状态特征学习和策略调整的强化学习
2. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning [J] . Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：强化学习中的有效偏差跨度受限探索开发
3. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning [J] . Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, JMLR: Workshop and Conference Proceedings . 2018,第4期

机译：强化学习中的有效偏差跨度受限探索开发
4. Learning State Features from Policies to Bias Exploration in Reinforcement Learning [C] . Bryan Singer, Manuela Veloso National conference on artificial intelligence . 1999

机译：学习州的强化学习勘探勘探的特征
5. Machine Learning and Algorithmic Bias: a Basic Qualitative Exploration of AI, Machine Learning, Bias and Regulation [D] . Meshcheryakov, Nell. 2021

机译：机器学习和算法偏见：AI，机器学习，偏见和调节的基本定性探索
6. Exploring Feature Dimensions to Learn a New Policy in an Uninformed Reinforcement Learning Task [O] . Oh-hyeon Choung, Sang Wan Lee, Yong Jeong -1

机译：探索功能维度以在不知情的强化学习任务中学习新策略
7. Learning State Features from Policies to Bias Exploration in Reinforcement Learning [O] . Bryan Singer, Manuela Veloso 1999

机译：从强化学习中的政策到偏见的学习状态特征

Learning State Features from Policies to Bias Exploration in Reinforcement Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅