Identification and off-policy learning of multiple objectives using adaptive clustering

Karimpanal Thommen George; Wilhelm Erik

首页> 外文期刊>Neurocomputing >Identification and off-policy learning of multiple objectives using adaptive clustering

【24h】

Identification and off-policy learning of multiple objectives using adaptive clustering

机译：使用自适应聚类识别和识别多目标

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work, we present a methodology that enables an agent to make efficient use of its exploratory actions by autonomously identifying possible objectives in its environment and learning them in parallel. The identification of objectives is achieved using an online and unsupervised adaptive clustering algorithm. The identified objectives are learned (at least partially) in parallel using Q-learning. Using a simulated agent and environment, it is shown that the converged or partially converged value function weights resulting from off-policy learning can be used to accumulate knowledge about multiple objectives without any additional exploration. We claim that the proposed approach could be useful in scenarios where the objectives are initially unknown or in real world scenarios where exploration is typically a time and energy intensive process. The implications and possible extensions of this work are also briefly discussed. (C) 2017 Elsevier B.V. All rights reserved.

机译：在这项工作中，我们提出一种方法，通过自动识别环境中的可能目标并并行学习目标，使代理能够有效利用其探索性行动。使用在线和无监督的自适应聚类算法可以实现目标的识别。使用Q学习并行（至少部分）学习识别的目标。使用模拟的主体和环境，可以证明，从策略外学习中得出的融合或部分融合的价值函数权重可以用于积累关于多个目标的知识，而无需进行任何其他探索。我们声称，提出的方法在最初目标未知的情况下或在勘探通常是时间和能源密集过程的现实世界中可能有用。还简要讨论了这项工作的含义和可能的扩展。（C）2017 Elsevier B.V.保留所有权利。

著录项

来源
《Neurocomputing》 |2017年第8期|39-47|共9页
作者
Karimpanal Thommen George; Wilhelm Erik;
展开▼
作者单位

Singapore Univ Technol & Design, Engn Prod Dev, 8 Somapah Rd, Singapore 487372, Singapore;

Singapore Univ Technol & Design, Engn Prod Dev, 8 Somapah Rd, Singapore 487372, Singapore;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Reinforcement learning; Q-learning; Off-policy; Adaptive clustering; Multiobjective learning;

机译：强化学习;Q学习;偏离策略;自适应聚类;多目标学习;

相似文献

外文文献
中文文献
专利

1. Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems [J] . Chen Ci, Lewis Frank L., Xie Kan, Automatica . 2020,第1期

机译：异构多代理系统自适应最优输出同步的禁止策略学习
2. Adaptive Trade-Offs in Off-Policy Learning [J] . Mark Rowland, Will Dabney, Remi Munos JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：违规学习的自适应权衡
3. Adaptive importance sampling for value function approximation in off-policy reinforcement learning. [J] . Hachiya H, Akiyama T, Sugiayma Neural Networks: The Official Journal of the International Neural Network Society . 2009,第10期

机译：在非政策强化学习中用于价值函数逼近的自适应重要性抽样。
4. Reinforcement learning off-policy methods in adaptive packet routing algorithms [C] . Yvan Tupac, Marley Vellasco, Marco Pacheco IASTED International Conference on Artificial Intelligence and Soft Computing . 2001

机译：加强自适应数据包路由算法中的脱离策略方法
5. Optimal placement of wind turbines on non-flat terrain using cluster identification and multi-objective genetic algorithm. [D] . Garcia Rosales, Carlos Alejandro. 2012

机译：使用聚类识别和多目标遗传算法在非平坦地形上优化风力涡轮机的位置。
6. An Opposition-Based Evolutionary Algorithm for Many-Objective Optimization with Adaptive Clustering Mechanism [O] . Wan Liang Wang, Weikun Li, Yu Le Wang 2019

机译：自适应聚类机制的多目标优化基于反对派的进化算法
7. Identification and Off-Policy Learning of Multiple Objectives Using Adaptive Clustering [O] . Karimpanal, Thommen George, Wilhelm, Erik 2017

机译：多目标识别与非政策学习自适应聚类

Identification and off-policy learning of multiple objectives using adaptive clustering

摘要

著录项

相似文献

相关主题

期刊订阅