Reinforcement Learning with Deep Energy-Based Policies

Tuomas Haarnoja; Haoran Tang; Pieter Abbeel; Sergey Levine

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Reinforcement Learning with Deep Energy-Based Policies

【24h】

Reinforcement Learning with Deep Energy-Based Policies

机译：基于深度能源策略的强化学习

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.

机译：我们提出了一种用于学习基于能量的连续状态和动作的策略的方法，该方法以前仅在表格域中才可行。我们将我们的方法应用于学习最大熵策略，从而产生了一种称为软Q学习的新算法，该算法通过Boltzmann分布表示最优策略。我们使用最近提出的摊销的Stein变分梯度下降来学习一个随机采样网络，该网络从该分布中近似采样。提出的算法的优点包括改进的探索性和组合性，可以在任务之间转移技能，我们在游泳和步行机器人的模拟实验中证实了这一点。我们还画出了与行动者批评方法的联系，可以观察到对相应的基于能量的模型进行近似推断。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2017年第3期|共10页
作者
Tuomas Haarnoja; Haoran Tang; Pieter Abbeel; Sergey Levine;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Actor-Critic Reinforcement Learning with Energy-Based Policies [J] . David Silver, Nicolas Heess, Yee Whye Teh JMLR: Workshop and Conference Proceedings . 2012,第2012期

机译：基于能源政策的演员批判强化学习
2. Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm [J] . Junta Wu, Huiyun Li Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：具有多种深度确定性政策梯度算法的深度集成钢筋学习
3. Deep reinforcement learning based trading agents: Risk curiosity driven learning for financial rules-based policy [J] . Hirchoua Badr, Ouhbi Brahim, Frikh Bouchra Expert systems with applications . 2021,第May期

机译：基于深度强化学习的交易代理：金融规则的危险效力驱动学习
4. Maximum Information Measure Policies in Reinforcement Learning with Deep Energy-Based Model [C] . K. Sharma, Bhopendra Singh, Edwin Herman, International Conference on Computational Intelligence and Knowledge Economy . 2021

机译：利用深度能源模型钢筋学习的最大信息措施
5. On Deep Reinforcement Learning for Games: Generalization of Deep Q-Learning with Multiple Policy Heads [D] . Boucher, Mathieu. 2020

机译：关于游戏的深度加固学习：多重政策头部深度Q学的泛化
6. Diversity Evolutionary Policy Deep Reinforcement Learning [O] . Jian Liu, Liming Feng 2021

机译：多样性进化政策深增强学习
7. LFQ: Online Learning of Per-flow Queuing Policies using Deep Reinforcement Learning [O] . Maximilian Bachl, Joachim Fabini, Tanja Zseby 2020

机译：LFQ：使用深度加强学习在线学习每流量排队政策

Reinforcement Learning with Deep Energy-Based Policies

摘要

著录项

相似文献

相关主题

期刊订阅