Constrained Reinforcement Learning Has Zero Duality Gap

机译：受限增强学习具有零二重性差距

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Autonomous agents must often deal with conflicting requirements, such as completing tasks using the least amount of time/energy, learning multiple tasks, or dealing with multiple opponents. In the context of reinforcement learning (RL), these problems are addressed by (i) designing a reward function that simultaneously describes all requirements or (ii) combining modular value functions that encode them individually. Though effective, these methods have critical downsides. Designing good reward functions that balance different objectives is challenging, especially as the number of objectives grows. Moreover, implicit interference between goals may lead to performance plateaus as they compete for resources, particularly when training on-policy. Similarly, selecting parameters to combine value functions is at least as hard as designing an all-encompassing reward, given that the effect of their values on the overall policy is not straightforward. The later is generally addressed by formulating the conflicting requirements as a constrained RL problem and solved using Primal-Dual methods. These algorithms are in general not guaranteed to converge to the optimal solution since the problem is not convex. This work provides theoretical support to these approaches by establishing that despite its non-convexity, this problem has zero duality gap, i.e., it can be solved exactly in the dual domain, where it becomes convex. Finally, we show this result basically holds if the policy is described by a good parametrization (e.g., neural networks) and we connect this result with primal-dual algorithms present in the literature and we establish the convergence to the optimal solution.

机译：自治代理必须经常处理冲突的要求，例如使用最少的时间/能量，学习多个任务或处理多个对手的任务。在强化学习（RL）的背景下，（i）设计了这些问题，设计了一个奖励函数，它同时描述了单独编码编码它们的模块化值函数的所有要求或（ii）。虽然有效，但这些方法具有关键的缺点。设计平衡不同目标的良好奖励功能是具有挑战性的，特别是随着目标的成长数量。此外，目标之间的隐式干扰可能导致性能平稳，因为它们竞争资源，特别是在培训政策时。类似地，选择要组合价值函数的参数至少是难以设计全包奖励，因为它们的价值观对整体政策的影响并不简单。通常通过将冲突的要求作为受约束的RL问题的冲突要求和使用原始方法进行解决来解决稍后的解决。这些算法通常不保证由于问题不凸起，因此不保证收敛于最佳解决方案。这项工作通过建立其非凸起，为这些方法提供了理论支持，尽管其非凸起，但该问题具有零二元间隙，即，它可以完全在双域中解决，在那里它变得凸起。最后，我们显示出该结果基本上是通过良好的参数化（例如神经网络）描述了策略，并且我们通过文献中存在的原始双算法连接这一结果，并建立了最佳解决方案的收敛性。

著录项

来源
《Conference on Neural Information Processing Systems》|2020年|p7160-7959|共11页
会议地点
作者
Santiago Paternain; Luiz F. O. Chamon; Miguel Calvo-Fullana; Alejandro Ribeiro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计量学;
关键词

相似文献

外文文献
中文文献
专利

1. Constrained motion planning of free-float dual-arm space manipulator via deep reinforcement learning [J] . Li Yinkang, Hao Xiaolong, She Yuchen, Aerospace science and technology . 2021,第Feba期

机译：深增强学习的自由浮法双臂空间机构的约束运动规划
2. Reinforcement Learning for Dual-Resource Constrained Scheduling * [J] . Miguel S.E. Martins, Joaquim L. Viegas, Tiago Coito, IFAC PapersOnLine . 2020,第2期

机译：用于双资源约束调度的加固学习 *
3. New dual constraint qualifications characterizing zero duality gaps of convex programs and semidefinite programs [J] . V. Jeyakumar, G.Y. Li Nonlinear Analysis: An International Multidisciplinary Journal . 2009,第12期

机译：新的双重约束资格描述了凸程序和半定程序的零对偶间隙
4. Constrained Reinforcement Learning Has Zero Duality Gap [C] . Santiago Paternain, Luiz F. O. Chamon, Miguel Calvo-Fullana, Conference on Neural Information Processing Systems . 2020

机译：受限增强学习具有零二重性差距
5. Improving Learning and Reducing Time: A Constrained Action Based Reinforcement Learning Approach [D] . Shen, Shitian. 2019

机译：改善学习和减少时间：基于约束的加强学习方法
6. Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning [O] . Shota Ohnishi, Eiji Uchibe, Yotaro Yamaguchi, 2019

机译：受约束的深度Q学习逐渐接近普通Q学习
7. Reinforcement Learning for Dual-Resource Constrained Scheduling [O] . Miguel S.E. Martins, Joaquim L. Viegas, Tiago Coito, 2020

机译：双资源约束调度的加固学习

Constrained Reinforcement Learning Has Zero Duality Gap

摘要

著录项

相似文献

相关主题

期刊订阅