Upper Bounds on the Performance of Discretisation in Reinforcement Learning

Michael Robin Mitchley

首页> 外文期刊>South African Computer Journal >Upper Bounds on the Performance of Discretisation in Reinforcement Learning

【24h】

Upper Bounds on the Performance of Discretisation in Reinforcement Learning

机译：强化学习中离散化性能的上限

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning is a machine learning framework whereby an agent learns to perform a task by maximising its total reward received for selecting actions in each state. The policy mapping states to actions that the agent learns is either represented explicitly, or implicitly through a value function. It is common in reinforcement learning to discretise a continuous state space using tile coding or binary features. We prove an upper bound on the performance of discretisation for direct policy representation or value function approximation.

机译：强化学习是一种机器学习框架，通过该框架，代理可以通过最大化其在每种状态下选择动作所获得的总奖励来学习执行任务。策略映射声明到代理学习到的操作的过程是显式表示的，或者是通过值函数隐式表示的。在强化学习中，通常使用图块编码或二进制特征离散化连续状态空间。我们证明了直接策略表示或价值函数逼近的离散化性能的上限。

著录项

来源
《South African Computer Journal》 |2015年第57期|共页
作者
Michael Robin Mitchley;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类信息与知识传播;
关键词
入库时间 2022-08-18 13:40:04

相似文献

外文文献
中文文献
专利

1. A deterministic discretisation-step upper bound for state estimation via Clark transformations [J] . W. P.Malcolm, R. J.Elliott, J.van der Hoek International journal of stochastic analysis . 2004,第4期

机译：通过Clark变换进行状态估计的确定性离散步骤上限
2. Experimental and upper-bound study of the influence of soilbag tail length on the reinforcement effect in soil slopes [J] . Wang Yan-Qiao, Liu Kang, Li Xian, Geotextiles and Geomembranes . 2019,第5期

机译：土包尾长对边坡加固效果影响的实验及上限研究
3. Design of Micropiles for Tunnel Face Reinforcement:Undrained Upper Bound Solution [J] . Nuria M. Pinyol, Eduardo E. Alonso Journal of geotechnical and geoenvironmental engineering . 2012,第1期

机译：隧道工作面加固微型桩设计：不排水的上界解决方案
4. Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning [C] . Harsh Gupta, R. Srikant, Lei Ying Conference on Neural Information Processing Systems . 2020

机译：有限时间绩效界限和两个时间级增强学习的自适应学习率选择
5. A Bounded Actor-Critic Algorithm for Reinforcement Learning [D] . Lawhead, Ryan Jacob. 2017

机译：一种有限于钢筋学习的批评算法
6. SA27. Reinforcement Learning Performance in Early Schizophrenia Patients and Their Nonpsychotic Siblings [O] . Hoi Ching Lee, Wing Chung Chang, Suet In Chan, 2017

机译：SA27。早期精神分裂症患者及其非精神病兄弟姐妹的强化学习表现
7. A Reinforcement Learning Scheme for Active Multi-Debris Removal Mission Planning With Modified Upper Confidence Bound Tree Search [O] . Jianan Yang, Xiaolei Hou, Yu Hen Hu, 2020

机译：具有修改的上置信度束缚树搜索的主动多碎屑拆除任务规划的加强学习方案

Upper Bounds on the Performance of Discretisation in Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅