首页> 外文OA文献 >Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

【2h】

Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

机译：强化有限的强化学习：在pOmDp中使用贝叶斯风险进行主动学习

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agentʼs sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agentʼs knowledge and actions that increase the agentʼs immediate reward. However, the task of specifying the POMDPʼs parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive.In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries—in which we ask an expert for the correct action—to infer the consequences of a potential pitfall without experiencing its effects. More important for human–robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified.

机译：在业务代表必须提前计划好几个步骤才能实现目标的领域中行动可能是一项艰巨的任务，尤其是在业务代表的传感器仅提供嘈杂或部分信息的情况下。在这种情况下，部分可观察的马尔可夫决策过程（POMDP）提供了一个计划框架，该框架可以在促成代理知识的行动与增加代理即时奖励的行动之间进行最佳交易。但是，指定POMDP参数的任务通常很繁琐。特别是，设置即时奖励以在信息收集和行动之间实现理想的平衡通常是不直观的。在这项工作中，我们提出了一种近似方法，该方法基于最小化在转换，观察和奖励模型为时选择行动的直接贝叶斯风险。不确定。贝叶斯风险准则避免了用多维连续状态空间求解POMDP的计算难点；我们证明它在各种问题上的表现都很好。我们使用策略查询（其中我们要求专家采取正确的措施）来推断潜在陷阱的后果而不会产生后果。对于人机交互设置而言，更重要的是，策略查询使代理无需指定奖励值即可学习奖励模型。

著录项

作者
Pineau Joelle; Doshi-Velez Finale P; Roy Nicholas;
展开▼
作者单位

展开▼
年度 2012
总页数
原文格式 PDF
正文语种 en_US
中图分类

相似文献

外文文献
中文文献
专利

1. Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs [J] . Finale Doshi-Velez, Joelle Pineau, Nicholas Roy Artificial intelligence . 2012,第期

机译：通过有限的强化进行强化学习：使用贝叶斯风险在POMDP中进行主动学习
2. Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning [J] . Naoto Horie, Tohgoroh Matsui, Koichi Moriyama, Artificial life and robotics . 2019,第3期

机译：多目标安全强化学习：多目标强化学习与安全强化学习之间的关系
3. Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration With Application to Autonomous Sequential Repair Problems [J] . Bhattacharya Sushmita, Badyal Sahil, Wheeler Thomas, IEEE Robotics and Automation Letters . 2020,第3期

机译：POMDP的加固学习：分区推出和策略迭代，应用于自主顺序修复问题
4. Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs [C] . Finale Doshi, Joelle Pineau, Nicholas Roy International Conference on Machine Learning . 2008

机译：钢筋有限的加固学习：在POMDPS中使用贝叶斯风险进行积极学习
5. Reinforcement Learning and Recurrent Reinforcement Learning for Dynamic Portfolio Optimization [D] . Almahdi, Saud 2019

机译：强化学习和循环强化学习以实现动态资产组合优化
6. Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs [O] . Finale Doshi, Joelle Pineau, Nicholas Roy -1

机译：通过有限的强化进行强化学习：使用Bayes风险在POMDP中进行主动学习
7. Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs [O] . Doshi-Velez Finale, Pineau Joelle, Roy Nicholas 2012

机译：通过有限的强化进行强化学习：使用贝叶斯风险在POMDP中进行主动学习

Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅