Optimal Time Scales for Reinforcement Learning Behaviour Strategies.

机译：强化学习行为策略的最佳时标。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement Learning is a branch of Artificial Intelligence addressing the problem of single-agent autonomous sequential decision making. It proposes computational models which do not rely on the complete knowledge of the dynamics of stochastic environments. Options are a formalism used to temporally extend actions towards hierarchically organized behaviour, a concept used to improve learning in large-scale problems. In this thesis we propose a new approach for generating options. Given controllers or behaviour policies as prior knowledge, we learn how to switch between these policies by optimizing the expected total discounted reward of the hierarchical behaviour. We derive gradient descent-based algorithms for learning optimal termination conditions of options, based on a new option termination representation. We provide theoretical guarantees and extentions of widely used Reinforcement Learning algorithms when options have variable time-scales. Finally, we incorporate the proposed approach into policy-gradient methods with linear function approximation.

机译：强化学习是人工智能的一个分支，致力于解决单主体自主顺序决策问题。它提出了不依赖于随机环境动力学的完整知识的计算模型。选项是一种形式主义，用于将行为暂时扩展到层次化的行为，该概念用于改善大规模问题的学习。本文提出了一种生成期权的新方法。给定控制器或行为策略作为先验知识，我们将学习如何通过优化预期的分层行为的总折价奖励来在这些策略之间进行切换。我们基于新的期权终止表示法，导出用于学习期权最佳终止条件的基于梯度下降的算法。当选项具有可变的时标时，我们提供广泛使用的强化学习算法的理论保证和范围。最后，我们将提出的方法结合到具有线性函数近似的策略梯度方法中。

著录项

作者
Comanici, Gheorghe.;
展开▼
作者单位

McGill University (Canada).;

展开▼
授予单位 McGill University (Canada).;
学科 Artificial Intelligence.;Computer Science.
学位 M.Sc.
年度 2010
页码 102 p.
总页数 102
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. New Methods for Optimal Operational Control of Industrial Processes Using Reinforcement Learning on Two Time Scales [J] . Xue Wenqian, Fan Jialu, Lopez Victor G., IEEE transactions on industrial informatics . 2020,第5期

机译：两次尺度钢筋学习的工业流程最优运行控制的新方法
2. Off-Policy Reinforcement Learning: Optimal Operational Control for Two-Time-Scale Industrial Processes [J] . Jinna Li, Bahare Kiumarsi, Tianyou Chai, Cybernetics, IEEE Transactions on . 2017,第12期

机译：非政策强化学习：两次规模工业流程的最优操作控制
3. Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning [J] . Mu Chaoxu, Zhao Qian, Gao Zhongke, Journal of the Franklin Institute . 2019,第13期

机译：使用强化学习的离散多主体系统最优共识控制的Q学习解决方案
4. Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning [C] . Harsh Gupta, R. Srikant, Lei Ying Conference on Neural Information Processing Systems . 2020

机译：有限时间绩效界限和两个时间级增强学习的自适应学习率选择
5. Scaling up reinforcement learning without sacrificing optimality by constraining exploration. [D] . Mann, Timothy Arthur. 2012

机译：通过限制探索，在不牺牲最优性的情况下扩大强化学习。
6. Scalable photonic reinforcement learning by time-division multiplexing of laser chaos [O] . Makoto Naruse, Takatomo Mihana, Hirokazu Hori, -1

机译：通过激光混沌的时分复用进行可扩展的光子强化学习
7. Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-time Popularities [O] . Sadeghi, Alireza, Sheikholeslami, Fatemeh, Giannakis, Georgios B. 2017

机译：基于maTLaB强化学习的5G优化可扩展缓存时空人气

Optimal Time Scales for Reinforcement Learning Behaviour Strategies.

摘要

著录项

相似文献

相关主题

期刊订阅