Path Consistency Learning in Tsallis Entropy Regularized MDPs

Yinlam Chow; Ofir Nachum; Mohammad Ghavamzadeh

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Path Consistency Learning in Tsallis Entropy Regularized MDPs

【24h】

Path Consistency Learning in Tsallis Entropy Regularized MDPs

机译：Tsallis熵正则化MDP中的路径一致性学习

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the sparse entropy-regularized reinforcement learning (ERL) problem in which the entropy term is a special form of the Tsallis entropy. The optimal policy of this formulation is sparse, i.e., at each state, it has non-zero probability for only a small number of actions. This addresses the main drawback of the standard Shannon entropy-regularized RL (soft ERL) formulation, in which the optimal policy is softmax, and thus, may assign a non-negligible probability mass to non-optimal actions. This problem is aggravated as the number of actions is increased. In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called sparse PCL, for the sparse ERL problem that can work with both on-policy and off-policy data. We first derive a sparse consistency equation that specifies a relationship between the optimal value function and policy of the sparse ERL along any system trajectory. Crucially, a weak form of the converse is also true, and we quantify the sub-optimality of a policy which satisfies sparse consistency, and show that as we increase the number of actions, this sub-optimality is better than that of the soft ERL optimal policy. We then use this result to derive the sparse PCL algorithms. We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions.

机译：我们研究稀疏熵正则化强化学习（ERL）问题，其中熵项是Tsallis熵的一种特殊形式。这种表述的最佳策略是稀疏的，即，在每种状态下，它仅对少量动作具有非零概率。这解决了标准的Shannon熵正则化RL（软ERL）公式的主要缺点，其中最佳策略是softmax，因此可以将不可忽略的概率质量分配给非最优动作。随着动作数量的增加，这个问题变得更加严重。在本文中，我们遵循Nachum等人的工作。（2017年）在软ERL设置中，针对可同时适用于政策上和政策外数据的稀疏ERL问题，提出了一类新颖的路径一致性学习（PCL）算法，称为稀疏PCL。我们首先导出一个稀疏一致性方程，该方程指定了沿任何系统轨迹的最优值函数和稀疏ERL策略之间的关系。至关重要的是，相反形式的弱形式也是正确的，我们量化了满足稀疏一致性的策略的次优性，并表明随着行动数量的增加，这种次优性优于软ERL的次优性。最佳政策。然后，我们使用此结果来得出稀疏PCL算法。我们根据经验将稀疏PCL与它的软对应项进行比较，并显示其优势，尤其是在存在大量动作的问题中。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第4期|共10页
作者
Yinlam Chow; Ofir Nachum; Mohammad Ghavamzadeh;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Person re-identification with dictionary learning regularized by stretching regularization and label consistency constraint [J] . Li Huafeng, Zhou Weiyan, Yu Zhengtao, Neurocomputing . 2020,第Feba28期

机译：通过扩展正则化和标签一致性约束对正则字典学习进行正则化的人员重新识别
2. Image Segmentation using a Refined Comprehensive Learning Particle Swarm Optimizer for Maximum Tsallis Entropy Thresholding [J] . L. Jubair Ahmed, A. Ebenezer Jeyakumar International Journal of Engineering and Technology . 2013,第4期

机译：使用精细综合学习粒子群优化器进行图像分割，用于最大TSAllis熵阈值
3. Evolving stochastic learning algorithm based on Tsallis entropic index [J] . Anastasiadis AD, Magoulas GD The European physical journal, B. Condensed matter physics . 2006,第1a2期

机译：基于Tsallis熵指标的演化随机学习算法
4. Sparse Actor-Critic: Sparse Tsallis Entropy Regularized Reinforcement Learning in a Continuous Action Space [C] . Jaegoo Choy, Kyungjae Lee, Songhwai Oh International Conference on Ubiquitous Robots . 2020

机译：稀疏演员批评：连续动作空间中的稀疏Tsallis熵正则化强化学习
5. Efficient regularized solution path algorithms with applications in machine learning and data mining [D] . Wang, Li 2008

机译：高效的规则化解决方案路径算法及其在机器学习和数据挖掘中的应用
6. Tsallis entropy and sparse reconstructive dictionary learning for exudate detection in diabetic retinopathy [O] . Vineeta Das, Niladri B. Puhan 2017

机译：Tsallis熵和稀疏重建字典学习用于糖尿病性视网膜病变的渗出液检测
7. Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning [O] . Lee, Kyungjae, Choi, Sungjoon, Oh, Songhwai 2017

机译：具有因果稀疏Tsallis熵的稀疏马尔可夫决策过程加强学习的规范化

Path Consistency Learning in Tsallis Entropy Regularized MDPs

摘要

著录项

相似文献

相关主题

期刊订阅