...
首页> 外文期刊>Neurocomputing >Identification and off-policy learning of multiple objectives using adaptive clustering
【24h】

Identification and off-policy learning of multiple objectives using adaptive clustering

机译:使用自适应聚类识别和识别多目标

获取原文
获取原文并翻译 | 示例
           

摘要

In this work, we present a methodology that enables an agent to make efficient use of its exploratory actions by autonomously identifying possible objectives in its environment and learning them in parallel. The identification of objectives is achieved using an online and unsupervised adaptive clustering algorithm. The identified objectives are learned (at least partially) in parallel using Q-learning. Using a simulated agent and environment, it is shown that the converged or partially converged value function weights resulting from off-policy learning can be used to accumulate knowledge about multiple objectives without any additional exploration. We claim that the proposed approach could be useful in scenarios where the objectives are initially unknown or in real world scenarios where exploration is typically a time and energy intensive process. The implications and possible extensions of this work are also briefly discussed. (C) 2017 Elsevier B.V. All rights reserved.
机译:在这项工作中,我们提出一种方法,通过自动识别环境中的可能目标并并行学习目标,使代理能够有效利用其探索性行动。使用在线和无监督的自适应聚类算法可以实现目标的识别。使用Q学习并行(至少部分)学习识别的目标。使用模拟的主体和环境,可以证明,从策略外学习中得出的融合或部分融合的价值函数权重可以用于积累关于多个目标的知识,而无需进行任何其他探索。我们声称,提出的方法在最初目标未知的情况下或在勘探通常是时间和能源密集过程的现实世界中可能有用。还简要讨论了这项工作的含义和可能的扩展。 (C)2017 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号