Zero-shot policy generation in lifelong reinforcement learning

Qian Yi-Ming; Xiong Fang-Zhou; Liu Zhi-Yong

首页> 外文期刊>Neurocomputing >Zero-shot policy generation in lifelong reinforcement learning

【24h】

Zero-shot policy generation in lifelong reinforcement learning

机译：终身加固学习中的零射精政策生成

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Lifelong reinforcement learning (LRL) is an important approach to achieve continual lifelong learning of multiple reinforcement learning tasks. The two major methods used in LRL are task decomposition and policy knowledge extraction. Policy knowledge extraction method in LRL can share knowledge for tasks in different task domains and for tasks in the same task domain with different system environmental coefficients. However, the generalization ability of policy knowledge extraction method is limited on learned tasks rather than learned task domains. In this paper, we propose a cross-domain lifelong reinforcement learning algorithm with zero-shot policy generation ability (CDLRL-ZPG) to improve generalization ability of policy knowledge extraction method from learned tasks to learned task domains. In experiments, we evaluated CDLRL-ZPG performance on four task domains. And our results show that the proposed algorithm can directly generate satisfactory results without needing a trial and error learning process to achieve zero-shot learning in general.(c) 2021 Elsevier B.V. All rights reserved.

机译：终身加强学习（LRL）是实现多重加固学习任务的持续终身学习的重要方法。 LRL中使用的两种主要方法是任务分解和政策知识提取。 LRL中的策略知识提取方法可以共享不同任务域中的任务知识以及具有不同系统环境系数的同一任务域中的任务。然而，政策知识提取方法的泛化能力是有限的学习任务而不是学到的任务领域。在本文中，我们提出了一种跨域终身加强学习算法，具有零击策策略生成能力（CDLRL-ZPG），以改善策略知识提取方法的泛化能力从学习任务到学习的任务域。在实验中，我们在四个任务域中评估了CDLRL-ZPG性能。我们的结果表明，该算法可以直接产生令人满意的结果，而无需试验和错误学习过程，以实现零射击学习。（c）2021 Elsevier B.V.保留所有权利。

著录项

来源
《Neurocomputing》 |2021年第25期|65-73|共9页
作者
Qian Yi-Ming; Xiong Fang-Zhou; Liu Zhi-Yong;
展开▼
作者单位

Univ Chinese Acad Sci UCAS Sch Artificial Intelligence Beijing 100049 Peoples R China|Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China;

Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China|Meituan Beijing Peoples R China;

Univ Chinese Acad Sci UCAS Sch Artificial Intelligence Beijing 100049 Peoples R China|Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China|Chinese Acad Sci CAS Ctr Excellence Brain Sci & Intelligence Techn Shanghai 200031 Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Lifelong reinforcement learning; Generalization policy; Task domain;

机译：终身加强学习;概括政策;任务域;

相似文献

外文文献
中文文献
专利

1. Policy and Value Transfer in Lifelong Reinforcement Learning [J] . David Abel, Yuu Jinnai, Sophie Yue Guo, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：终身加固学习中的政策和价值转移
2. European Union Policies on Lifelong Learning: In-between Competitiveness Enhancement and Social Stability Reinforcement [J] . Eugenia Panitsidou, Eleni Griva, Dora Chostelidou Procedia - Social and Behavioral Sciences . 2012,第2期

机译：欧洲联盟终身学习政策：增强竞争力与增强社会稳定之间
3. European Union Policies on Lifelong Learning: In-between Competitiveness Enhancement and Social Stability Reinforcement [J] . Eugenia Panitsidou, Eleni Griva, Dora Chostelidou Procedia - Social and Behavioral Sciences . 2012,第2期

机译：欧洲联盟终身学习政策：增强竞争力与增强社会稳定之间
4. Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control [C] . Zhuo Xu, Chen Tang, Masayoshi Tomizuka International Conference on Intelligent Transportation Systems . 2018

机译：基于鲁棒控制的自动驾驶零散深强化学习驾驶策略传递
5. Optimal tracking control of uncertain systems: On-policy and off-policy reinforcement learning approaches [D] . Modares, Hamidreza 2015

机译：不确定系统的最优跟踪控制：基于策略和基于策略的强化学习方法
6. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning [O] . Tulika Saha, Sriparna Saha, Pushpak Bhattacharyya 2020

机译：利用等级强化学习的多意图对话的情感对话策略学习
7. Zero-shot Deep Reinforcement Learning Driving Policy Transfer for Autonomous Vehicles based on Robust Control [O] . Zhuo Xu, Chen Tang, Masayoshi Tomizuka 2018

机译：基于鲁棒控制的自主车辆零击零钢筋学习驾驶政策转移

Zero-shot policy generation in lifelong reinforcement learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅