A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning

Hoai An Le Thi; Vinh Thanh Ho; Tao Pham Dinh

首页> 外文期刊>Journal of Global Optimization >A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning

【24h】

A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning

机译：统一的DC编程框架和基于有效DCA的大规模批强化学习方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We investigate a powerful nonconvex optimization approach based on Difference of Convex functions (DC) programming and DC Algorithm (DCA) for reinforcement learning, a general class of machine learning techniques which aims to estimate the optimal learning policy in a dynamic environment typically formulated as a Markov decision process (with an incomplete model). The problem is tackled as finding the zero of the so-called optimal Bellman residual via the linear value-function approximation for which two optimization models are proposed: minimizing the p-norm of a vector-valued convex function, and minimizing a concave function under linear constraints. They are all formulated as DC programs for which attractive DCA schemes are developed. Numerical experiments on various examples of the two benchmarks of Markov decision process problemsGarnet and Gridworld problems, show the efficiency of our approaches in comparison with two existing DCA based algorithms and two state-of-the-art reinforcement learning algorithms.

机译：我们研究了基于凸函数差异（DC）编程和DC算法（DCA）的强大的非凸优化方法，用于强化学习，这是一类通用的机器学习技术，旨在估算通常被表述为动态环境的动态环境中的最佳学习策略。马尔可夫决策过程（模型不完整）。解决该问题的方法是通过线性值函数近似找到所谓的最佳Bellman残差的零点，为此提出了两个优化模型：最小化矢量值凸函数的p范数，以及最小化下的凹函数线性约束。它们都被制定为DC程序，并为其开发了有吸引力的DCA方案。在Markov决策过程问题的两个基准的各种示例的数值实验中，Garnet和Gridworld问题表明，与两种现有的基于DCA的算法和两种最新的强化学习算法相比，我们的方法是有效的。

著录项

来源
《Journal of Global Optimization》 |2019年第2期|279-310|共32页
作者
Hoai An Le Thi; Vinh Thanh Ho; Tao Pham Dinh;
展开▼
作者单位

Univ Lorraine, Lab Theoret & Appl Comp Sci, EA 3097, F-57045 Metz, France;

Univ Lorraine, Lab Theoret & Appl Comp Sci, EA 3097, F-57045 Metz, France;

Univ Normandie, INSA Rouen, Math Lab, F-76801 St Etienne Du Rouvray, France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Batch reinforcement learning; Markov decision process; DC programming; DCA; Optimal Bellman residual;

机译：批量强化学习;马尔可夫决策过程;DC编程;DCA;最优Bellman残差;

相似文献

外文文献
中文文献
专利

1. Poster: Self-tuning Batching in Total Order Broadcast via Analytical Modelling and Reinforcement Learning [J] . Paolo Romano, Matteo Leonetti Performance evaluation review . 2011,第2期

机译：海报：通过分析建模和强化学习在总订单广播中进行自调整批处理
2. A non-convex algorithm framework based on DC programming and DCA for matrix completion [J] . Geng Juan, Wang Laisheng, Wang Yanfei Numerical algorithms . 2015,第4期

机译：基于DC编程和DCA的非凸矩阵完成算法框架
3. A new efficient algorithm based on DC programming and DCA for clustering [J] . Le Thi Hoai An, M. Tayeb Belghiti, Pham Dinh Tao Journal of Global Optimization . 2007,第4期

机译：一种基于DC编程和DCA的高效聚类新算法
4. Self-tuning batching in total order broadcast protocols via analytical modelling and reinforcement learning [C] . Romano Paolo, Leonetti Matteo Computing, Networking and Communications (ICNC), 2012 International Conference on . 2012

机译：通过分析建模和强化学习在总订单广播协议中进行自调整批处理
5. A comparison and analysis of the attitudes of corporate communications professionals and broadcast professionals toward higher education communication programs and student preparation for broadcast and nonbroadcast positions in communication [D] . Dorman, William James 1989

机译：公司传播专业人士和广播专业人士对高等教育传播课程以及学生对传播中的广播和非广播职位的准备的态度的比较和分析
6. Audio podcasts in practical courses in biochemistry – cost-efficient e-learning in a well-proven format from radio broadcasting [O] . Dieter Münch-Harrach, Christian Kothe, Wolfgang Hampe 2013

机译：生物化学实践课程中的音频播客–广播电台以一种行之有效的格式进行具有成本效益的电子学习
7. Self-tuning Batching in Total Order Broadcast Protocols via Analytical Modelling and Reinforcement Learning [O] . Paolo Romano, Matteo Leonetti 2012

机译：通过分析建模和强化学习在总订单广播协议中进行自调整批处理
8. Novel Power-Efficient Broadcast Routing Algorithm Exploiting Broadcast Efficiency [R] . Kang, I. , Poovendran, R. 2003

机译：一种利用广播效率的新型节能广播路由算法

A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning

摘要

著录项

相似文献

相关主题

期刊订阅