Sparse Q-Iearning with Mirror Descent

机译：镜像下降的稀疏Q学习

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper explores a new framework for reinforcement learning based on online convex optimization, in particular mirror descent and related algorithms. Mirror descent can be viewed as an enhanced gradient method, particularly suited to minimization of convex functions in high-dimensional spaces. Unlike traditional gradient methods, mirror descent undertakes gradient updates of weights in both the dual space and primal space, which are linked together using a Legen-dre transform. Mirror descent can be viewed as a proximal algorithm where the distance generating function used is a Bregman divergence. A new class of proximal-gradient based temporal-difference (TD) methods are presented based on different Bregman divergences, which are more powerful than regular TD learning. Examples of Bregman divergences that are studied include p-norm functions, and Mahalanobis distance based on the covariance of sample gradients. A new family of sparse mirror-descent reinforcement learning methods are proposed, which are able to find sparse fixed points of an l_1-regularized Bellman equation at significantly less computational cost than previous methods based on second-order matrix methods. An experimental study of mirror-descent reinforcement learning is presented using discrete and continuous Markov decision processes.

机译：本文探索了一种基于在线凸优化的强化学习新框架，特别是镜像下降和相关算法。镜像下降可以看作是一种增强的梯度方法，特别适合于最小化高维空间中的凸函数。与传统的梯度方法不同，镜像下降在对偶空间和原始空间中都进行权重的梯度更新，这些空间使用Legen-dre变换链接在一起。镜像下降可以看作是近端算法，其中所使用的距离生成函数是布雷格曼散度。基于不同的Bregman散度，提出了一种基于近端梯度的新型时差（TD）方法，该方法比常规TD学习更强大。研究的布雷格曼散度的例子包括p范函数和基于样本梯度协方差的马氏距离。提出了一种新的稀疏镜像下降强化学习方法系列，该方法能够以比以前基于二阶矩阵方法的计算量少得多的计算量找到l_1正则化Bellman方程的稀疏不动点。使用离散和连续的马尔可夫决策过程提出了镜像下降强化学习的实验研究。

著录项

来源
《Conference on uncertainty in artificial intelligence》|2012年|564-573|共10页
会议地点
作者
Sridhar Mahadevan; Bo Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers [J] . Yao Ma, Alex Olshevsky, Csaba Szepesvari, Journal of machine learning research . 2020,第a期

机译：用于稀疏秩的梯度下降 - 一个矩阵完成，用于稀疏互动工人的人群汇总
2. Greed Meets Sparsity: Understanding and Improving Greedy Coordinate Descent for Sparse Optimization [J] . Huang Fang, Zhenan Fan, Yifan Sun, JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：贪婪符合稀疏性：了解和改善稀疏优化的贪婪坐标血统
3. Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers [J] . Yao Ma, Alexander Olshevsky, Csaba Szepesvari, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：稀疏互动工作者的人群来源聚集的稀疏秩一矩阵完成的梯度下降
4. Sparse Q-learning with Mirror Descent [C] . Sridhar Mahadevan, Bo Liu Conference on Uncertainty in Artificial Intelligence . 2012

机译：稀疏Q-学习与镜子血液
5. Tempered Bregman Divergence for Continuous and Discrete Time Mirror Descent and Robust Classification [D] . Amid, Ehsan. 2020

机译：钢化钢化长者偏离连续和离散时间镜血液血统和鲁棒分类
6. Stochastic Mirror Descent Dynamics and Their Convergence in Monotone Variational Inequalities [O] . Panayotis Mertikopoulos, Mathias Staudigl -1

机译：单调变分不等式中随机镜像下降动力学及其收敛性
7. Momentum‐based accelerated mirror descent stochastic approximation for robust topology optimization under stochastic loads [O] . Weichen Li, Xiaojia Shelly Zhang 2021

机译：随机载荷下，基于动量的加速镜血管下降随机近似值

Sparse Q-Iearning with Mirror Descent

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅