POMDP solving: what rewards do you really expect at execution?

机译：POMDP解决：您真正期望在执行时奖励吗？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Partially Observable Markov Decision Processes have gained an increasing interest in many research communities, due to sensible improvements of their optimization algorithms and of computers capabilities. Yet, most research focus on optimizing either average accumulated rewards (AI planning) or direct entropy (active perception), whereas none of them matches the rewards actually gathered at execution. Indeed, the first optimization criterion linearly averages over all belief states, so that it does not gain best information from different observations, while the second one totally discards rewards. Thus, motivated by simple demonstrative examples, we study an additive combination of these two criteria to get the best of reward gathering and information acquisition at execution. We then compare our criterion with classical ones, and highlight the need to consider new hybrid non-linear criteria, on a realistic multi-target recognition and tracking mission.

机译：由于其优化算法和计算机能力的明智改善，部分可观察到的马尔可夫决策过程对许多研究社区的兴趣越来越受到了越来越兴趣。然而，大多数研究侧重于优化平均累计奖励（AI规划）或直接熵（积极感知），而其中一部分符合实际收集在执行时的奖励。实际上，第一个优化标准在所有信仰状态下线性平均值，因此它不会获得不同观察的最佳信息，而第二个完全丢弃奖励。因此，通过简单的演示例子激励，我们研究这两个标准的添加剂组合，以获得最佳的奖励收集和信息获取。然后，我们将我们的标准与古典的标准进行比较，并突出需要考虑新的混合非线性标准，以实现逼真的多目标识别和跟踪任务。

著录项

来源
《Starting AI Researchers' Symposium》|2011年||共13页
会议地点
作者
Caroline Ponzoni Carvalho CHANEL; Jean-Loup FARGES; Florent TEICHTEIL-KONIGSBUCH; Guillaume INFANTES;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
POMDP; Active perception; Optimization criterion;

机译：POMDP;积极感知;优化标准;

相似文献

外文文献
中文文献
专利

1. Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion [J] . Xiaofeng Jiang, Xiaodong Wang, Hongsheng Xi, Automatic Control, IEEE Transactions on . 2017,第11期

机译：预期平均奖励标准下的Dec-POMDP集中优化
2. Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion [J] . Xiaofeng Jiang, Hongsheng Xi, Xiaodong Wang, IEEE Transactions on Automatic Control . 2016,第10期

机译：在预期平均奖励标准下寻找约束POMDP的最佳基于观察的策略
3. Finding optimal memoryless policies of POMDPs under the expected average reward criterion [J] . Li Y., Yin B., Xi H. European Journal of Operational Research . 2011,第3期

机译：在预期平均奖励标准下找到POMDP的最佳无记忆策略
4. POMDP solving: what rewards do you really expect at execution? [C] . Caroline Ponzoni Carvalho CHANEL, Jean-Loup FARGES, Florent TEICHTEIL-KONIGSBUCH, Starting AI Researchers' Symposium . 2011

机译：POMDP解决：您真正期望在执行时奖励吗？
5. Point-Based POMDP Solvers: Survey and Comparative Analysis. [D] . Kaplow, Robert. 2010

机译：基于点的POMDP解决方案：调查和比较分析。
6. The effects of expected reward on creative problem solving [O] . Irene Cristofori, Carola Salvi, Mark Beeman, -1

机译：预期奖励对创造性问题解决的影响
7. POMDP solving: what rewards do you really expect at execution? [O] . Ponzoni Carvalho Chanel Caroline, Farges Jean-Loup, Teichteil-Königsbuch Florent, 2010

机译：POMDP解决方案：您在执行时真正期望得到什么回报？

POMDP solving: what rewards do you really expect at execution?

摘要

著录项

相似文献

相关主题

期刊订阅