【24h】

Regularized Off-Policy TD-Learning

机译:正规化非政策性TD学习

获取原文

摘要

We present a novel l_1 regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity. The algorithmic framework underlying RO-TD integrates two key ideas: off-policy convergent gradient TD methods, such as TDC, and a convex-concave saddle-point formulation of non-smooth convex optimization, which enables first-order solvers and feature selection using online convex regularization. A detailed theoretical and experimental analysis of RO-TD is presented. A variety of experiments are presented to illustrate the off-policy convergence, sparse feature selection capability and low computational cost of the RO-TD algorithm.
机译:我们提出了一种新颖的l_1正则化的非策略收敛TD学习方法(称为RO-TD),该方法能够以低计算复杂度来学习值函数的稀疏表示。 RO-TD的算法框架集成了两个关键思想:非策略性收敛梯度TD方法(例如TDC)和非光滑凸优化的凸凹鞍点公式,这使得一阶求解器和特征选择能够使用在线凸正则化。介绍了RO-TD的详细理论和实验分析。提出了各种实验来说明RO-TD算法的非策略收敛,稀疏特征选择能力和低计算成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号