首页> 外文期刊>The Journal of Artificial Intelligence Research >Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man
【24h】

Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man

机译:使用基于低复杂度的基于规则的策略学习游戏:Pac-Man女士的插图

获取原文
获取原文并翻译 | 示例
       

摘要

In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either hand-crafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist.
机译:在本文中,我们提出了一种可以处理某些组合强化学习任务的方法。我们在受欢迎的《吃豆人》游戏中演示了这种方法。我们定义了一组高级观察和操作模块,从中可以自动构建基于规则的策略。在这些策略中,操作会暂时扩展,并且可能会同时工作。代理的策略由紧凑的决策列表编码。列表的组成部分是从大量规则中选择的,这些规则可以手工制作也可以自动生成。交叉熵法是一种合适的规则选择方法,交叉熵法是一种最近的全局优化算法,可以很好地适应我们的框架。交叉熵优化的策略比我们手工制定的策略表现更好,并且达到了普通玩家的得分。我们认为学习是成功的,主要是因为(i)策略可以应用并发操作,因此策略空间足够丰富;(ii)搜索偏向于低复杂度策略,因此,如果存在以下问题,则可以快速找到具有简洁描述的解决方案它们存在。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号