首页> 外文会议>2011 IEEE International Conference on Systems, Man, and Cybernetics >Profit sharing that can learn deterministic policy for POMDPs environments
【24h】

Profit sharing that can learn deterministic policy for POMDPs environments

机译:可以学习POMDP环境的确定性策略的利润共享

获取原文
获取外文期刊封面目录资料

摘要

In this paper, we propose a Profit Sharing that can learn deterministic policy for POMDPs environments. The proposed method can obtain the deterministic policy by using the history of observations. In the proposed method, the states in the perceptual aliasing are detected. Here, the perceptual aliasing means that different states can be perceived as the same. In the states in the perceptual aliasing, the action is selected based on the history of observations. In order to use the history of observations in the action selection, the rules of observation sequences and their values are defined. In the proposed method, the deterministic policy can be learn finally by considering the history of observations if needed. We carried out a series of computer experiments, and confirmed that the proposed method can detect the states in perceptual aliasing in the POMDPs environment, and can obtain the deterministic policy using the values for the rules of observation sequences.
机译:在本文中,我们提出了可以学习POMDP环境的确定性策略的利润共享。所提出的方法可以利用观测历史来获得确定性策略。在所提出的方法中,检测了感知混叠中的状态。在这里,感知混叠意味着可以将不同的状态视为相同。在感知混叠中的状态下,将根据观察的历史选择动作。为了在操作选择中使用观察历史,定义了观察序列的规则及其值。在提出的方法中,如果需要的话,可以通过考虑观测历史来最终学习确定性策略。我们进行了一系列的计算机实验,并证实了该方法可以检测POMDPs环境中感知混叠中的状态,并可以使用观测序列规则的值获得确定性策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号