Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces

Wang Jun Ping; Shi You Kang; Zhang Wen Sheng; Thomas Ian; Duan Shi Hui

首页> 外文期刊>IEEE transactions on industrial informatics >Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces

【24h】

Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces

机译：具有大状态空间的人级控制的多任务策略对抗学习

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The sequential decision-making problem with large-scale state spaces is an important and challenging topic for multitask reinforcement learning (MTRL). Training near-optimality policies across tasks suffers from prior knowledge deficiency in discrete-time nonlinear environment, especially for continuous task variations, requiring scalability approaches to transfer prior knowledge among new tasks when considering large number of tasks. This paper proposes a multitask policy adversarial learning (MTPAL) method for learning a nonlinear feedback policy that generalizes across multiple tasks, making cognizance ability of robot much closer to human-level decision making. The key idea is to construct a parametrized policymodel directly from large high-dimensional observations by deep function approximators, and then train optimal of sequential decision policy for each new task by an adversarial process, in which simultaneously twomodels are trained: amultitask policy generator transforms samples drawn froma prior distribution into samples from a complex data distribution with higher dimensionality, and a multitask policy discriminator decides whether the given sample is prior distribution from human-level empirically derived or from the generator. All the related human-level empirically derived are integrated into the sequential decision policy, transferring human-level policy at every layer in a deep policy network. Extensive experimental testing result of four different WeiChai Power manufacturing data sets shows that our approach can surpass human performance simultaneously from cart-pole to production assembly control.

机译：具有大型状态空间的顺序决策问题是多任务强化学习（MTRL）的重要且具有挑战性的主题。在离散时间非线性环境中，尤其是对于连续的任务变化，在任务之间训练接近最优的策略存在先验知识不足的问题，当考虑大量任务时，需要可伸缩性方法在新任务之间转移先验知识。本文提出了一种多任务策略对抗学习（MTPAL）方法，用于学习可跨多个任务概括的非线性反馈策略，从而使机器人的认知能力更接近于人类的水平决策。关键思想是通过深层函数逼近器直接从大型高维观测值构造参数化的策略模型，然后通过对抗过程为每个新任务训练最优的顺序决策策略，在此过程中同时训练两个模型：多任务策略生成器转换样本从先验分布中抽取具有更高维度的复杂数据分布中的样本，然后多任务策略鉴别器确定给定样本是从人类一级凭经验得出的还是从生成器中进行的先验分布。根据经验得出的所有相关人员级别都集成到顺序决策策略中，从而在深层策略网络的每一层上传递人员级别策略。四个不同的潍柴动力制造数据集的广泛实验测试结果表明，从控制杆到生产装配控制，我们的方法可以同时超越人工性能。

著录项

来源
《IEEE transactions on industrial informatics》 |2019年第4期|2395-2404|共10页
作者
Wang Jun Ping; Shi You Kang; Zhang Wen Sheng; Thomas Ian; Duan Shi Hui;
展开▼
作者单位

Chinese Acad Sci, Lab Precis Sensing & Control Ctr, Inst Automat, Beijing 100190, Peoples R China;

China Acad Telecommun Res, Commun Stand Res Inst, MIIT, Beijing 100191, Peoples R China;

Chinese Acad Sci, Lab Precis Sensing & Control Ctr, Inst Automat, Beijing 100190, Peoples R China;

Fujitsu RunMyProcess, F-92600 Asnieres, France;

China Acad Telecommun Res, Commun Stand Res Inst, MIIT, Beijing 100191, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep multitask reinforcement learning; flexible manufacturing; industrial big data; sequential decision making (SDM);

机译：深度多任务加固学习;灵活制造;工业大数据;连续决策（SDM）;

相似文献

外文文献
中文文献
专利

1. Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces [J] . Wang Jun Ping, Shi You Kang, Zhang Wen Sheng, IEEE transactions on industrial informatics . 2019,第4期

机译：大型状态空间的人力水平控制多任务政策对抗学习
2. DMGAN: Adversarial Learning-Based Decision Making for Human-Level Plant-Wide Operation of Process Industries Under Uncertainties [J] . Zheng Nianzu, Ding Jinliang, Chai Tianyou Neural Networks and Learning Systems, IEEE Transactions on . 2021,第3期

机译：DMGAN：基于对抗基于学习的人类学习的决策，用于在不确定因素下的工艺产业的植物范围内运营
3. Multitask Learning for Estimating Multitype Cardiac Indices in MRI and CT Based on Adversarial Reverse Mapping [J] . Yu Chengjin, Gao Zhifan, Zhang Weiwei, Neural Networks and Learning Systems, IEEE Transactions on . 2021,第2期

机译：基于对冲反向映射的MRI和CT中跨性心脏指数的多任务学习
4. Multitask Adversarial Learning for Chinese Font Style Transfer [C] . Lei Wu, Xi Chen, Lei Meng, International Joint Conference on Neural Networks . 2020

机译：汉字字体转换的多任务对抗学习
5. Simultaneous variable selection and simultaneous subspace selection for multitask learning. [D] . Obozinski, Guillaume Romain. 2009

机译：用于多任务学习的同时变量选择和同时子空间选择。
6. AITL: Adversarial Inductive Transfer Learning with input and output space adaptation for pharmacogenomics [O] . Hossein Sharifi-Noghabi, Shuman Peng, Olga Zolotareva, -1

机译：AITL：具有针对药物基因组学的输入和输出空间自适应的对抗性归纳学习
7. Multitask Learning Strengthens Adversarial Robustness [O] . Chengzhi Mao, Amogh Gupta, Vikram Nitin, 2020

机译：多任务学习加强对抗鲁棒性

Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces

摘要

著录项

相似文献

相关主题

期刊订阅