首页> 外文会议>Annual conference on Neural Information Processing Systems >Regularized Off-Policy TD-Learning

【24h】

Regularized Off-Policy TD-Learning

机译：正规化非政策性TD学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a novel l_1 regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm.

机译：我们提出了一种新颖的l_1正则化的非策略收敛TD学习方法（称为RO-TD），该方法能够以低计算复杂度来学习值函数的稀疏表示。 RO-TD的算法框架集成了两个关键思想：非策略性收敛梯度TD方法（例如TDC）和非光滑凸优化的凸凹鞍点公式，这使得一阶求解器和特征选择能够使用在线凸正则化。介绍了RO-TD的详细理论和实验分析。提出了各种实验来说明RO-TD算法的非策略收敛，稀疏特征选择能力和低计算成本。

著录项

来源
《Annual conference on Neural Information Processing Systems 》|2012年|836-844|共9页
会议地点
作者
Bo Liu; Sridhar Mahadevan; Ji Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Stable Policy Optimization via Off-Policy Divergence Regularization [J] . Ahmed Touati, Amy Zhang, Joelle Pineau, JMLR: Workshop and Conference Proceedings . 2020 ,第2010期

机译：通过脱助政策发散正规化稳定的政策优化
2. Generalization of TD-learning from a Semiparametric Statistical Viewpoint [J] . Tsuyoshi UENO, Shin-ichi MAEDA, Kawanabe MOTOAKI, 電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 . 2010 ,第76期

机译：从半参数统计观点出发的TD学习的一般化
3. Generalization of TD-learning from a Semiparametric Statistical Viewpoint [J] . Tsuyoshi UENO, Shin-ichi MAEDA, Kawanabe MOTOAKI, 電子情報通信学会技術研究報告. 情報論的学習理論と機械学習 . 2010 ,第76期

机译：从半参数统计观点出发的TD学习的一般化
4. Regularized Off-Policy TD-Learning [C] . Bo Liu, Sridhar Mahadevan, Ji Liu Annual conference on Neural Information Processing Systems . 2012

机译：正常禁止政策TD学习
5. Machine Learning for Decision Making: Applications to Off-Policy Learning and Combinatorial Optimization [D] . Lu, Hao. 2021

机译：机器学习决策：禁止禁止学习和组合优化的应用
6. Off-Policy Evaluation of the Performance of a Robot Swarm: Importance Sampling to Assess Potential Modifications to the Finite-State Machine That Controls the Robots [O] . Federico Pagnozzi, Mauro Birattari 2021

机译：对机器人群体性能的违规评估：重要的采样以评估对控制机器人的有限状态机的潜在修改
7. Utilization of artificial neural networks and the TD-learning method for constructing intelligent decision support systems [O] . N. Baba, H. Suto 2015

机译：人工神经网络和TD学习方法在智能决策支持系统构建中的应用

Regularized Off-Policy TD-Learning

摘要

著录项

相似文献

相关主题

期刊订阅