首页> 美国政府科技报告 >Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration.

【24h】

Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration.

机译：利用多步样本轨迹进行近似值迭代。

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Approximate value iteration methods for reinforcement learning (RL) generalize experience from limited samples across large state-action spaces. The function approximators used in such methods typically introduce errors in value estimation which can harm the quality of the learned value functions. We present a new batch-mode, off-policy, approximate value iteration algorithm called Trajectory Fitted Q-Iteration (TFQI). This approach uses the sequential relationship between samples within a trajectory, a set of samples gathered sequentially from the problem domain, to lessen the adverse influence of approximation errors while deriving long-term value. We provide a detailed description of the FTQI approach and an empirical study that analyzes the impact of our method on two well-known RL benchmarks. Our experiments demonstrate this approach has significant benefits including: better learned policy performance, improved convergence, and some decreased sensitivity to the choice of function approximation.

著录项

作者
Wright, R.; Loscalzo, S.; Dexter, P.; Yu, L.;
展开▼
作者单位

展开▼
年度 2013
页码 1-17
总页数 17
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Robotics; Learning machines; Reinforcement learning; Sample trajectory data; Function approximation; Approximate value iteration; Pe62788f; Wuafrls2mamiih;

机译：机器人;学习机器;强化学习;样本轨迹数据;函数逼近;近似值迭代; pe62788f; Wuafrls2mamiih;

相似文献

外文文献
中文文献
专利

1. A multi-step procedure to determine the number of factors in large approximate factor models [J] . Luo Ronghua, Jiang Jiakun, Lan Wei, Communications in Statistics . 2021,第16a18期

机译：一个多步骤，以确定大近似因子模型中的因素数量
2. Approximate analytical solutions to nonlinear peristaltic flow with temperature-dependent viscosity parameters: Application of multi-step differential transform method (MsDTM) [J] . Elogail M. A., Elshekhipy A. A. Canadian Journal of Physics . 2018,第3期

机译：温度依赖性粘度参数的非线性蠕动流动近似分析解：多步差分变换法的应用（MSDTM）
3. A New Multi-Step Iterative Algorithm for Approximating Common Fixed Points of a Finite Family of Multi-Valued Bregman Relatively Nonexpansive Mappings [J] . Wiyada Kumam, Pongsakorn Sunthrayuth, Phond Phunchongharn, Algorithms . 2016,第2期

机译：一种新的多步迭代算法，用于逼近有限个多值Bregman相对非扩张映射族的公共不动点
4. Exploiting Multi-step Sample Trajectories for Approximate Value Iteration [C] . Robert Wright, Steven Loscalzo, Philip Dexter, European conference on machine learning and knowledge discovery in databases . 2013

机译：利用多步样本轨迹进行近似值迭代
5. A THEORY OF APPROXIMATE INVERSES FOR THE SOLUTION OF MATRIX EQUATIONS BY ITERATION. [D] . VANDERSCHEL, DAVID JON. 1970

机译：通过迭代求解矩阵方程的近似逆的理论。
6. Fractional Multi-Step Differential Transformed Method for Approximating a Fractional Stochastic SIS Epidemic Model with Imperfect Vaccination [O] . Salah Abuasad, Ahmet Yildirim, Ishak Hashim, 2019

机译：带有不完全疫苗接种的分数随机SIS流行病模型的分数多步差分变换方法
7. Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration [O] . Robert Wright, Steven Loscalzo, Philip Dexter, 2015

机译：利用多步样本轨迹进行近似值迭代

Exploiting Multi-Step Sample Trajectories for Approximate Value Iteration.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅