首页> 外文会议>Genetic and evolutionary computation conference >Credit Assignment Method for Learning Effective Stochastic Policies in Uncertain Domains
【24h】

Credit Assignment Method for Learning Effective Stochastic Policies in Uncertain Domains

机译:在不确定域中学习有效随机政策的信用分配方法

获取原文

摘要

In this paper, we introduce FirstVisit Profit-sharing (FVPS) as a credit assignment procedure, an important issue in classifier systems and reinforcement learning frameworks. FVPS reinforces effective rules to make an agent acquire stochastic policies that cause it to behave very robustly within uncertain domains, without pre-defined knowledge or subgoals. We use an internal episodic memory, not only to identify perceptual aliasing states but also to discard looping behavior and to acquire effective stochastic policies to escape perceptual deceptive states. We demonstrate the effectiveness of our method in some typical classes of Partially Observable Markov Decision Processes, comparing with Sarsa(A) using a replacing eligibility trace. We claim that this approach results in an effective stochastic or deterministic policy which is appropriate for the environment.
机译:在本文中,我们介绍了FirstVisit利润共享(FVPS)作为信用分配程序,是分类器系统和加强学习框架中的一个重要问题。 FVPS加强了有效的规则,使代理商收购随机政策,使其在不确定的域内在不确定的域中表现得非常强大,而无需预定义的知识或亚地。我们使用内部焦虑记忆,不仅要识别感知别名状态,还要丢弃循环行为,并获得有效的随机政策以逃避感知欺骗状态。我们展示了我们在一些典型的部分观察到的马尔可夫决策过程中的方法的有效性,使用替代资格轨迹比较Sarsa(A)。我们声称,这种方法导致有效的随机或确定性政策,适合环境。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号