...
首页> 外文期刊>IEEE transactions on industrial informatics >Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces
【24h】

Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces

机译:具有大状态空间的人级控制的多任务策略对抗学习

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The sequential decision-making problem with large-scale state spaces is an important and challenging topic for multitask reinforcement learning (MTRL). Training near-optimality policies across tasks suffers from prior knowledge deficiency in discrete-time nonlinear environment, especially for continuous task variations, requiring scalability approaches to transfer prior knowledge among new tasks when considering large number of tasks. This paper proposes a multitask policy adversarial learning (MTPAL) method for learning a nonlinear feedback policy that generalizes across multiple tasks, making cognizance ability of robot much closer to human-level decision making. The key idea is to construct a parametrized policymodel directly from large high-dimensional observations by deep function approximators, and then train optimal of sequential decision policy for each new task by an adversarial process, in which simultaneously twomodels are trained: amultitask policy generator transforms samples drawn froma prior distribution into samples from a complex data distribution with higher dimensionality, and a multitask policy discriminator decides whether the given sample is prior distribution from human-level empirically derived or from the generator. All the related human-level empirically derived are integrated into the sequential decision policy, transferring human-level policy at every layer in a deep policy network. Extensive experimental testing result of four different WeiChai Power manufacturing data sets shows that our approach can surpass human performance simultaneously from cart-pole to production assembly control.
机译:具有大型状态空间的顺序决策问题是多任务强化学习(MTRL)的重要且具有挑战性的主题。在离散时间非线性环境中,尤其是对于连续的任务变化,在任务之间训练接近最优的策略存在先验知识不足的问题,当考虑大量任务时,需要可伸缩性方法在新任务之间转移先验知识。本文提出了一种多任务策略对抗学习(MTPAL)方法,用于学习可跨多个任务概括的非线性反馈策略,从而使机器人的认知能力更接近于人类的水平决策。关键思想是通过深层函数逼近器直接从大型高维观测值构造参数化的策略模型,然后通过对抗过程为每个新任务训练最优的顺序决策策略,在此过程中同时训练两个模型:多任务策略生成器转换样本从先验分布中抽取具有更高维度的复杂数据分布中的样本,然后多任务策略鉴别器确定给定样本是从人类一级凭经验得出的还是从生成器中进行的先验分布。根据经验得出的所有相关人员级别都集成到顺序决策策略中,从而在深层策略网络的每一层上传递人员级别策略。四个不同的潍柴动力制造数据集的广泛实验测试结果表明,从控制杆到生产装配控制,我们的方法可以同时超越人工性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号