首页> 外文期刊>Journal of Global Optimization >A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning
【24h】

A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning

机译:统一的DC编程框架和基于有效DCA的大规模批强化学习方法

获取原文
获取原文并翻译 | 示例
       

摘要

We investigate a powerful nonconvex optimization approach based on Difference of Convex functions (DC) programming and DC Algorithm (DCA) for reinforcement learning, a general class of machine learning techniques which aims to estimate the optimal learning policy in a dynamic environment typically formulated as a Markov decision process (with an incomplete model). The problem is tackled as finding the zero of the so-called optimal Bellman residual via the linear value-function approximation for which two optimization models are proposed: minimizing the p-norm of a vector-valued convex function, and minimizing a concave function under linear constraints. They are all formulated as DC programs for which attractive DCA schemes are developed. Numerical experiments on various examples of the two benchmarks of Markov decision process problemsGarnet and Gridworld problems, show the efficiency of our approaches in comparison with two existing DCA based algorithms and two state-of-the-art reinforcement learning algorithms.
机译:我们研究了基于凸函数差异(DC)编程和DC算法(DCA)的强大的非凸优化方法,用于强化学习,这是一类通用的机器学习技术,旨在估算通常被表述为动态环境的动态环境中的最佳学习策略。马尔可夫决策过程(模型不完整)。解决该问题的方法是通过线性值函数近似找到所谓的最佳Bellman残差的零点,为此提出了两个优化模型:最小化矢量值凸函数的p范数,以及最小化下的凹函数线性约束。它们都被制定为DC程序,并为其开发了有吸引力的DCA方案。在Markov决策过程问题的两个基准的各种示例的数值实验中,Garnet和Gridworld问题表明,与两种现有的基于DCA的算法和两种最新的强化学习算法相比,我们的方法是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号