首页> 外文会议>European conference on machine learning and knowledge discovery in databases >Properly Acting under Partial Observability with Action Feasibility Constraints
【24h】

Properly Acting under Partial Observability with Action Feasibility Constraints

机译:在部分可行性和行动可行性约束下适当采取行动

获取原文

摘要

We introduce Action-Constrained Partially Observable Markov Decision Process (AC-POMDP), which arose from studying critical robotic applications with damaging actions. AC-POMDPs restrict the optimized policy to only apply feasible actions: each action is feasible in a subset of the state space, and the agent can observe the set of applicable actions in the current hidden state, in addition to standard observations. We present optimality equations for AC-POMDPs, which imply to operate on a-vectors defined over many different belief subspaces. We propose an algorithm named Precondition Value Iteration (PCVI), which fully exploits this specific property of AC-POMDPs about a-vectors. We also designed a relaxed version of PCVI whose complexity is exponentially smaller than PCVI. Experimental results on POMDP robotic benchmarks with action feasibility constraints exhibit the benefits of explicitly exploiting the semantic richness of action-feasibility observations in AC-POMDPs over equivalent but unstructured POMDPs.
机译:我们介绍了行动受限的部分可观察的马尔可夫决策过程(AC-POMDP),该过程源于研究具有破坏性作用的关键机器人应用。 AC-POMDP将优化的策略限制为仅应用可行的操作:每个操作在状态空间的子集中都是可行的,除了标准观察值之外,代理还可以观察当前隐藏状态下的一组适用操作。我们提出了AC-POMDP的最优性方程,这意味着要对在许多不同的置信子空间上定义的a矢量进行运算。我们提出了一种称为前提条件值迭代(PCVI)的算法,该算法充分利用了AC-POMDP关于a向量的这一特定属性。我们还设计了PCVI的宽松版本,其复杂度比PCVI小得多。在具有行动可行性约束的POMDP机器人基准测试中,实验结果表明,与等效但非结构化的POMDP相比,可以显着地利用AC-POMDP中行动可行性观察的语义丰富性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号