首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Reinforcement Learning with Deep Energy-Based Policies
【24h】

Reinforcement Learning with Deep Energy-Based Policies

机译:基于深度能源策略的强化学习

获取原文
       

摘要

We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution. We use the recently proposed amortized Stein variational gradient descent to learn a stochastic sampling network that approximates samples from this distribution. The benefits of the proposed algorithm include improved exploration and compositionality that allows transferring skills between tasks, which we confirm in simulated experiments with swimming and walking robots. We also draw a connection to actor-critic methods, which can be viewed performing approximate inference on the corresponding energy-based model.
机译:我们提出了一种用于学习基于能量的连续状态和动作的策略的方法,该方法以前仅在表格域中才可行。我们将我们的方法应用于学习最大熵策略,从而产生了一种称为软Q学习的新算法,该算法通过Boltzmann分布表示最优策略。我们使用最近提出的摊销的Stein变分梯度下降来学习一个随机采样网络,该网络从该分布中近似采样。提出的算法的优点包括改进的探索性和组合性,可以在任务之间转移技能,我们在游泳和步行机器人的模拟实验中证实了这一点。我们还画出了与行动者批评方法的联系,可以观察到对相应的基于能量的模型进行近似推断。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号