首页> 外文会议>Annual Conference on Information Sciences and Systems >Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization : Invited Presentation
【24h】

Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization : Invited Presentation

机译:政策搜索无限地平线打折削减学习:通过与非凸优化的联系进步:邀请演示

获取原文

摘要

In reinforcement learning (RL), an agent moving through a state space, selects actions which cause a transition to a new state according to an unknown Markov transition density that depends on the previous state and action. After each transition, a reward that informs the quality of being in a particular state is revealed. The goal is to select the action sequence to maximize the long term accumulation of rewards, or value. We focus on the case where the policy that determines how actions are chosen is a fixed stationary distribution parameterized by a vector, the problem horizon is infinite, and the states and actions belong to continuous Euclidean subsets.
机译:在钢筋学习(RL)中,通过状态空间移动的代理选择根据取决于先前状态和动作的未知马尔可夫转换密度,从而选择导致到新状态的动作。在每次转换后,揭示了通知处于特定状态的质量的奖励。目标是选择动作序列以最大化奖励或值的长期累积。我们专注于确定选择如何选择操作的策略是由矢量参数化的固定静止分布,问题Horizo​​ n为无限,并且状态和动作属于连续的欧几里德子集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号