首页> 外文会议>Annual Conference on Information Sciences and Systems >Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization : Invited Presentation

【24h】

Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization : Invited Presentation

机译：政策搜索无限地平线打折削减学习：通过与非凸优化的联系进步：邀请演示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In reinforcement learning (RL), an agent moving through a state space, selects actions which cause a transition to a new state according to an unknown Markov transition density that depends on the previous state and action. After each transition, a reward that informs the quality of being in a particular state is revealed. The goal is to select the action sequence to maximize the long term accumulation of rewards, or value. We focus on the case where the policy that determines how actions are chosen is a fixed stationary distribution parameterized by a vector, the problem horizon is infinite, and the states and actions belong to continuous Euclidean subsets.

机译：在钢筋学习（RL）中，通过状态空间移动的代理选择根据取决于先前状态和动作的未知马尔可夫转换密度，从而选择导致到新状态的动作。在每次转换后，揭示了通知处于特定状态的质量的奖励。目标是选择动作序列以最大化奖励或值的长期累积。我们专注于确定选择如何选择操作的策略是由矢量参数化的固定静止分布，问题Horizo n为无限，并且状态和动作属于连续的欧几里德子集。

著录项

来源
《Annual Conference on Information Sciences and Systems》|2019年|582p|共3页
会议地点
作者
Kaiqing Zhang; Alec Koppel; Hao Zhu; Tamer Ba?car;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 G20-53;
关键词
concave programming; learning (artificial intelligence); Markov processes;

机译：凹编程;学习（人工智能）;马尔可夫进程;

相似文献

外文文献
中文文献
专利

1. Improving RTS Game AI by Supervised Policy Learning, Tactical Search, and Deep Reinforcement Learning [J] . Barriga Nicolas A., Stanescu Marius, Besoain Felipe, IEEE computational intelligence magazine . 2019,第3期

机译：通过监督策略学习，战术搜索和深度强化学习来改善RTS Game AI
2. Approximate Robust Policy Iteration Using Multilayer Perceptron Neural Networks for Discounted Infinite-Horizon Markov Decision Processes With Uncertain Correlated Transition Matrices [J] . Li B., Si J. Neural Networks, IEEE Transactions on . 2010,第8期

机译：不确定关联Markov决策过程的多层感知器神经网络的近似鲁棒策略迭代
3. Continuous-Time Q-Learning for Infinite-Horizon Discounted Cost Linear Quadratic Regulator Problems [J] . Palanisamy M., Modares H., Lewis F.L., Cybernetics, IEEE Transactions on . 2015,第2期

机译：无限时间折扣成本线性二次调节器问题的连续时间Q学习
4. Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization : Invited Presentation [C] . Kaiqing Zhang, Alec Koppel, Hao Zhu, Annual Conference on Information Sciences and Systems . 2019

机译：无限视野折扣强化学习中的策略搜索：通过与非凸优化的连接而取得的进步：特邀演讲
5. Policy advice, non-convex and distributed optimization in reinforcement learning [D] . Zhan, Yusen. 2016

机译：强化学习中的政策建议，非凸和分布式优化
6. Effects of Group I metabotropic glutamate receptor antagonists on sensitivity to reinforcer magnitude and delayed reinforcement in a delay-discounting task in rats: contribution of delay presentation order [O] . Justin R Yates, Katherine K Rogers, Benjamin T Gunkel, -1

机译：第一组代谢型谷氨酸受体拮抗剂对大鼠延迟折扣任务中对增强子幅度和延迟增强的敏感性的影响：延迟呈递顺序的贡献
7. To Discount or not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning [O] . Sridhar Mahadevan 1994

机译：强化学习中要折扣还是不折扣：R学习和Q学习比较的案例研究

Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization : Invited Presentation

摘要

著录项

相似文献

相关主题

期刊订阅