【24h】

Sparse Q-Iearning with Mirror Descent

机译:镜像下降的稀疏Q学习

获取原文
获取外文期刊封面目录资料

摘要

This paper explores a new framework for reinforcement learning based on online convex optimization, in particular mirror descent and related algorithms. Mirror descent can be viewed as an enhanced gradient method, particularly suited to minimization of convex functions in high-dimensional spaces. Unlike traditional gradient methods, mirror descent undertakes gradient updates of weights in both the dual space and primal space, which are linked together using a Legen-dre transform. Mirror descent can be viewed as a proximal algorithm where the distance generating function used is a Bregman divergence. A new class of proximal-gradient based temporal-difference (TD) methods are presented based on different Bregman divergences, which are more powerful than regular TD learning. Examples of Bregman divergences that are studied include p-norm functions, and Mahalanobis distance based on the covariance of sample gradients. A new family of sparse mirror-descent reinforcement learning methods are proposed, which are able to find sparse fixed points of an l_1-regularized Bellman equation at significantly less computational cost than previous methods based on second-order matrix methods. An experimental study of mirror-descent reinforcement learning is presented using discrete and continuous Markov decision processes.
机译:本文探索了一种基于在线凸优化的强化学习新框架,特别是镜像下降和相关算法。镜像下降可以看作是一种增强的梯度方法,特别适合于最小化高维空间中的凸函数。与传统的梯度方法不同,镜像下降在对偶空间和原始空间中都进行权重的梯度更新,这些空间使用Legen-dre变换链接在一起。镜像下降可以看作是近端算法,其中所使用的距离生成函数是布雷格曼散度。基于不同的Bregman散度,提出了一种基于近端梯度的新型时差(TD)方法,该方法比常规TD学习更强大。研究的布雷格曼散度的例子包括p范函数和基于样本梯度协方差的马氏距离。提出了一种新的稀疏镜像下降强化学习方法系列,该方法能够以比以前基于二阶矩阵方法的计算量少得多的计算量找到l_1正则化Bellman方程的稀疏不动点。使用离散和连续的马尔可夫决策过程提出了镜像下降强化学习的实验研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号