首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning
【24h】

Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning

机译:贝叶斯非参数方法的部分可观察的强化学习

获取原文
获取原文并翻译 | 示例

摘要

Making intelligent decisions from incomplete information is critical in many applications: for example, robots must choose actions based on imperfect sensors, and speech-based interfaces must infer a user’s needs from noisy microphone inputs. What makes these tasks hard is that often we do not have a natural representation with which to model the domain and use for choosing actions; we must learn about the domain’s properties while simultaneously performing the task. Learning a representation also involves trade-offs between modeling the data that we have seen previously and being able to make predictions about new data. This article explores learning representations of stochastic systems using Bayesian nonparametric statistics. Bayesian nonparametric methods allow the sophistication of a representation to scale gracefully with the complexity in the data. Our main contribution is a careful empirical evaluation of how representations learned using Bayesian nonparametric methods compare to other standard learning approaches, especially in support of planning and control. We show that the Bayesian aspects of the methods result in achieving state-of-the-art performance in decision making with relatively few samples, while the nonparametric aspects often result in fewer computations. These results hold across a variety of different techniques for choosing actions given a representation.
机译:根据不完整的信息做出明智的决定在许多应用中至关重要:例如,机器人必须基于不完善的传感器来选择动作,基于语音的界面必须从嘈杂的麦克风输入中推断出用户的需求。使这些任务变得困难的原因是,我们经常没有自然的表示方式来建模领域并用于选择动作。我们必须在执行任务的同时了解域的属性。学习表示形式还涉及在对我们之前看到的数据进行建模与能够对新数据进行预测之间进行权衡。本文探讨了使用贝叶斯非参数统计的随机系统的学习表示形式。贝叶斯非参数方法允许表示的复杂性随着数据的复杂性而优雅地扩展。我们的主要贡献是对使用贝叶斯非参数方法学习的表示方法与其他标准学习方法的比较方法进行了仔细的经验评估,尤其是在计划和控制方面。我们表明,这些方法的贝叶斯方面可以在相对较少的样本中实现最先进的决策,而非参数方面通常可以减少计算量。这些结果包含多种不同技术,用于选择给定表示形式的动作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号