Convergence of Value Aggregation for Imitation Learning

Ching-An Cheng; Byron Boots

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Convergence of Value Aggregation for Imitation Learning

【24h】

Convergence of Value Aggregation for Imitation Learning

机译：仿制学习价值聚集的融合

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Value aggregation is a general framework for solving imitation learning problems. Based on the idea of data aggregation, it generates a policy sequence by iteratively interleaving policy optimization and evaluation in an online learning setting. While the existence of a good policy in the policy sequence can be guaranteed non-asymptotically, little is known about the convergence of the sequence or the performance of the last policy. In this paper, we debunk the common belief that value aggregation always produces a convergent policy sequence with improving performance. Moreover, we identify a critical stability condition for convergence and provide a tight non-asymptotic bound on the performance of the last policy. These new theoretical insights let us stabilize problems with regularization, which removes the inconvenient process of identifying the best policy in the policy sequence in stochastic problems.

机译：值聚合是解决模仿学习问题的一般框架。基于数据聚合的概念，它通过在在线学习设置中迭代交织策略优化和评估来生成策略序列。虽然可以保证在政策序列中的良好政策的存在，但是对于序列的汇聚或最后一项策略的性能很少。在本文中，我们揭示了常见的信念，即价值聚合总是产生具有提高性能的收敛策略序列。此外，我们确定了收敛的关键稳定条件，并提供了最后一项策略性能的紧密非渐近界。这些新的理论洞察力让我们稳定正规化问题，从而消除了在随机问题中识别政策序列中最佳政策的不便。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共9页
作者
Ching-An Cheng; Byron Boots;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Dynamic regret convergence analysis and an adaptive regularization algorithm for on-policy robot imitation learning [J] . Jonathan N. Lee, Michael Laskey, Ajay Kumar Tanwani, The International journal of robotics research . 2021,第10a11期

机译：动态遗憾收敛分析与对政策机器人模仿学习的自适应正规化算法
2. Imitation or innovation: To what extent do exploitative learning and exploratory learning foster imitation strategy and innovation strategy for sustained competitive advantage? [J] . Ali Murad Technological forecasting and social change . 2021,第Apra期

机译：模仿或创新：利用竞争优势的剥削策略和创新策略在多大程度上？
3. Evaluation of Occupational Performance Imitation Intervention on Three Imitation Learnings among Autism: Case Series [J] . Smily Jesu Priya Victor Paulraj, Ruwinah Abdul Karim, Jayachandran Vetrayan Procedia - Social and Behavioral Sciences . 2015,第2期

机译：孤独症患者三种模仿学习的职业绩效模仿干预评估：案例系列
4. Uncertainty-Aware Data Aggregation for Deep Imitation Learning [C] . Yuchen Cui, David Isele, Scott Niekum, International Conference on Robotics and Automation . 2019

机译：用于深度模仿学习的不确定性数据聚合
5. Learning to search: Structured prediction techniques for imitation learning. [D] . Ratliff, Nathan D. 2009

机译：学习搜索：模仿学习的结构化预测技术。
6. Learning for a Robot: Deep Reinforcement Learning Imitation Learning Transfer Learning [O] . Jiang Hua, Liangcai Zeng, Gongfa Li, 2021

机译：学习机器人：深增强学习仿制学习转移学习
7. Dynamic regret convergence analysis and an adaptive regularization algorithm for on-policy robot imitation learning [O] . Jonathan N. Lee, Michael Laskey, Ajay Kumar Tanwani, 2021

机译：动态遗憾收敛分析与对政策机器人模仿学习的自适应正则化算法

Convergence of Value Aggregation for Imitation Learning

摘要

著录项

相似文献

相关主题

期刊订阅