Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning

Doshi-Velez F.; Pfau D.; Wood F.; Roy N.

首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning

【24h】

Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning

机译：贝叶斯非参数方法的部分可观察的强化学习

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Making intelligent decisions from incomplete information is critical in many applications: for example, robots must choose actions based on imperfect sensors, and speech-based interfaces must infer a user’s needs from noisy microphone inputs. What makes these tasks hard is that often we do not have a natural representation with which to model the domain and use for choosing actions; we must learn about the domain’s properties while simultaneously performing the task. Learning a representation also involves trade-offs between modeling the data that we have seen previously and being able to make predictions about new data. This article explores learning representations of stochastic systems using Bayesian nonparametric statistics. Bayesian nonparametric methods allow the sophistication of a representation to scale gracefully with the complexity in the data. Our main contribution is a careful empirical evaluation of how representations learned using Bayesian nonparametric methods compare to other standard learning approaches, especially in support of planning and control. We show that the Bayesian aspects of the methods result in achieving state-of-the-art performance in decision making with relatively few samples, while the nonparametric aspects often result in fewer computations. These results hold across a variety of different techniques for choosing actions given a representation.

机译：根据不完整的信息做出明智的决定在许多应用中至关重要：例如，机器人必须基于不完善的传感器来选择动作，基于语音的界面必须从嘈杂的麦克风输入中推断出用户的需求。使这些任务变得困难的原因是，我们经常没有自然的表示方式来建模领域并用于选择动作。我们必须在执行任务的同时了解域的属性。学习表示形式还涉及在对我们之前看到的数据进行建模与能够对新数据进行预测之间进行权衡。本文探讨了使用贝叶斯非参数统计的随机系统的学习表示形式。贝叶斯非参数方法允许表示的复杂性随着数据的复杂性而优雅地扩展。我们的主要贡献是对使用贝叶斯非参数方法学习的表示方法与其他标准学习方法的比较方法进行了仔细的经验评估，尤其是在计划和控制方面。我们表明，这些方法的贝叶斯方面可以在相对较少的样本中实现最先进的决策，而非参数方面通常可以减少计算量。这些结果包含多种不同技术，用于选择给定表示形式的动作。

著录项

来源
《Pattern Analysis and Machine Intelligence, IEEE Transactions on》 |2015年第2期|394-407|共14页
作者
Doshi-Velez F.; Pfau D.; Wood F.; Roy N.;
展开▼
作者单位

Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bayes methods; Computational modeling; Hidden Markov models; History; Knowledge representation; Learning (artificial intelligence); Markov processes; Artificial intelligence; HDP-HMM; POMDP; Reinforcement Learning; hierarchial Dirichlet process hidden Markov model; machine learning; partially-observable Markov decision process; reinforcement learning;

机译：贝叶斯方法;计算建模;隐马尔可夫模型;历史;知识表示;学习（人工智能）;马尔可夫过程;人工智能;HDP-HMM;POMDP;强化学习;分层Dirichlet过程隐马尔可夫模型;机器学习;部分可观察的马尔可夫决策过程;强化学习;

相似文献

外文文献
中文文献
专利

1. A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments [J] . David Vengerov Future generation computer systems . 2008,第7期

机译：在部分可观察的环境中基于梯度的强化学习方法进行动态定价
2. A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game [J] . SHIN ISHH, HAJIME FUJITA, MASAOKI MITSUTAKE, Machine Learning . 2005,第1a2期

机译：部分可观察的多智能体游戏的强化学习方案
3. Discussion of 'Nonparametric Bayesian Inference in Applications': Bayesian nonparametric methods in econometrics [J] . Jim Griffin, Maria Kalli, Mark Steel Statistical Methods and Applications . 2018,第2期

机译：讨论“应用中的非参数贝叶斯推断”：计量经济学中的贝叶斯非参数方法
4. A Multi-Agent Reinforcement Learning Method for a Partially-Observable Competitive Game [C] . Yoichiro Matsuno, Tatsuya Ymazaki, Shin Ishii 5th International Conference on Autonomous Agents, 5th, May 28 - Jun 1, 2001, Montreal, Canada . 2001

机译：部分可观察竞争游戏的多智能体强化学习方法
5. Bayesian Nonparametric Reinforcement Learning in LTE and Wi-Fi Coexistence [D] . Shih, Po-Kan. 2021

机译：LTE和Wi-Fi共存的贝叶斯非参数增强学习
6. A nonparametric Bayesian method of translating machine learning scores to probabilities in clinical decision support [O] . Brian Connolly, K. Bretonnel Cohen, Daniel Santel, 2017

机译：一种非参数贝叶斯方法将机器学习分数转换为临床决策支持中的概率
7. Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning [O] . Doshi-Velez Finale P., Pfau David, Wood Frank, 2013

机译：部分可观测强化学习的贝叶斯非参数方法

Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅