首页> 外文会议>International Florida Artificial Intelligence Research Society Conference >Exploiting Key Events for Learning Interception Policies
【24h】

Exploiting Key Events for Learning Interception Policies

机译:利用学习拦截政策的关键事件

获取原文

摘要

One scenario that commonly arises in computer games and military training simulations is predator-prey pursuit in which the goal of the non-player character agent is to successfully intercept a fleeing player. In this paper, we focus on a variant of the problem in which the agent does not have perfect information about the player's location but has prior experience in combating the player. Effectively addressing this problem requires a combination of learning the opponent's tactics while planning an interception strategy. Although for small maps, solving the problem with standard POMDP (Partially Observable Markov Decision Process) solvers is feasible, increasing the search area renders many standard techniques intractable due to the increase in the belief state size and required plan length. Here we introduce a new approach for solving the problem on large maps that exploits key events, high reward regions in the belief state discovered at the higher level of ion, to plan efficiently over the low-level map. We demonstrate that our hierarchical key-events planner can learn intercept policies from traces of previous pursuits significantly faster than a standard point-based POMDP solver, particularly as the maps scale in size.
机译:一种常见于电脑游戏和军事训练模拟的情景是捕食者 - 猎物追求,其中非球员性格代理的目标是成功拦截逃离的播放器。在本文中,我们专注于该问题的变体,其中代理商没有关于玩家的位置的完美信息,而是在打击玩家的经验。有效解决这个问题需要在规划拦截策略的同时学习对手的策略。虽然对于小地图,解决标准POMDP(部分可观察马尔可夫决策过程)求解器是可行的,因此增加搜索区域呈现许多标准技术由于信仰状态大小的增加和所需的计划长度而难以置力。在这里,我们介绍了一种解决问题的新方法,用于解决大型地图上的问题,该问题利用了在较高的离子水平上发现的信仰状态下的高奖励区域,以有效地在低级地图上规划。我们证明,我们的分层关键事件规划师可以从先前追求的迹线比基于标准点的POMDP求解器更快地学习拦截策略,特别是作为大小的地图比例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号