首页> 外文期刊>The Knowledge Engineering Review >Automatic landmark discovery for learning agents under partial observability
【24h】

Automatic landmark discovery for learning agents under partial observability

机译:在部分可观察性下为学习代理自动发现地标

获取原文
获取原文并翻译 | 示例
       

摘要

In the reinforcement learning context, a landmark is a compact information which uniquely couples a state, for problems with hidden states. Landmarks are shown to support finding good memoryless policies for Partially Observable Markov Decision Processes (POMDP) which contain at least one landmark. SarsaLandmark, as an adaptation of Sarsa(lambda), is known to promise a better learning performance with the assumption that all landmarks of the problem are known in advance.In this paper, we propose a framework built upon SarsaLandmark, which is able to automatically identify landmarks within the problem during learning without sacrificing quality, and requiring no prior information about the problem structure. For this purpose, the framework fuses SarsaLandmark with a well-known multiple-instance learning algorithm, namely Diverse Density (DD). By further experimentation, we also provide a deeper insight into our concept filtering heuristic to accelerate DD, abbreviated as DDCF (Diverse Density with Concept Filtering), which proves itself to be suitable for POMDPs with landmarks. DDCF outperforms its antecedent in terms of computation speed and solution quality without loss of generality.The methods are empirically shown to be effective via extensive experimentation on a number of known and newly introduced problems with hidden state, and the results are discussed.
机译:在强化学习环境中,界标是一种紧凑的信息,可以唯一地将一个状态耦合在一起,以解决隐藏状态的问题。地标显示为支持为包含至少一个地标的部分可观察的马尔可夫决策过程(POMDP)寻找好的无记忆策略。 SarsaLandmark是Sarsa(lambda)的改编版,已知该问题的所有界标都是已知的,它有望提供更好的学习性能。在本文中,我们提出了一个基于SarsaLandmark的框架,该框架能够自动在学习过程中确定问题内的标志性特征,而又不牺牲质量,也不需要有关问题结构的先验信息。为此,框架将SarsaLandmark与著名的多实例学习算法融合在一起,即多元密度(DD)。通过进一步的实验,我们还提供了更深入的概念筛选启发式方法以加速DD,缩写为DDCF(具有概念筛选的多元密度),证明了自己适用于具有里程碑意义的POMDP。 DDCF在计算速度和解决方案质量方面均胜过其先例,并且不失一般性。通过对大量已知状态和新引入的具有隐藏状态的问题进行大量试验,经验地证明了该方法是有效的,并对结果进行了讨论。

著录项

  • 来源
    《The Knowledge Engineering Review》 |2019年第2019期|e11.1-e11.17|共17页
  • 作者单位

    Middle East Tech Univ Dept Comp Engn TR-06800 Ankara Turkey;

    STM Def Technol Engn & Trade Inc RF & Simulat Syst Directorate TR-06530 Ankara Turkey;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号