首页> 外文会议>Starting AI Researchers' Symposium >POMDP solving: what rewards do you really expect at execution?
【24h】

POMDP solving: what rewards do you really expect at execution?

机译:POMDP解决:您真正期望在执行时奖励吗?

获取原文

摘要

Partially Observable Markov Decision Processes have gained an increasing interest in many research communities, due to sensible improvements of their optimization algorithms and of computers capabilities. Yet, most research focus on optimizing either average accumulated rewards (AI planning) or direct entropy (active perception), whereas none of them matches the rewards actually gathered at execution. Indeed, the first optimization criterion linearly averages over all belief states, so that it does not gain best information from different observations, while the second one totally discards rewards. Thus, motivated by simple demonstrative examples, we study an additive combination of these two criteria to get the best of reward gathering and information acquisition at execution. We then compare our criterion with classical ones, and highlight the need to consider new hybrid non-linear criteria, on a realistic multi-target recognition and tracking mission.
机译:由于其优化算法和计算机能力的明智改善,部分可观察到的马尔可夫决策过程对许多研究社区的兴趣越来越受到了越来越兴趣。然而,大多数研究侧重于优化平均累计奖励(AI规划)或直接熵(积极感知),而其中一部分符合实际收集在执行时的奖励。实际上,第一个优化标准在所有信仰状态下线性平均值,因此它不会获得不同观察的最佳信息,而第二个完全丢弃奖励。因此,通过简单的演示例子激励,我们研究这两个标准的添加剂组合,以获得最佳的奖励收集和信息获取。然后,我们将我们的标准与古典的标准进行比较,并突出需要考虑新的混合非线性标准,以实现逼真的多目标识别和跟踪任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号