首页> 外文学位 >Discretization and approximation methods for reinforcement learning of highly reconfigurable systems.

【24h】

Discretization and approximation methods for reinforcement learning of highly reconfigurable systems.

机译：离散化和逼近方法，用于高度可重构系统的强化学习。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

There are a number of techniques that are used to solve reinforcement learning problems, but very few that have been developed for and tested on highly reconfigurable systems cast as reinforcement learning problems. Reconfigurable systems refers to a vehicle (air, ground, or water) or collection of vehicles that can change its geometrical features, i.e. shape or formation, to perform tasks that the vehicle could not otherwise accomplish. These systems tend to be optimized for several operating conditions, and then controllers are designed to reconfigure the system from one operating condition to another. Q-learning, an unsupervised episodic learning technique that solves the reinforcement learning problem, is an attractive control methodology for reconfigurable systems. It has been successfully applied to a myriad of control problems, and there are a number of variations that were developed to avoid or alleviate some limitations in earlier version of this approach. This dissertation describes the development of three modular enhancements to the Q-learning algorithm that solve some of the unique problems that arise when working with this class of systems, such as the complex interaction of reconfigurable parameters and computationally intensive models of the systems. A multi-resolution state-space discretization method is developed that adaptively rediscretizes the state-space by progressively finer grids around one or more distinct Regions Of Interest within the state or learning space. A genetic algorithm that autonomously selects the basis functions to be used in the approximation of the action-value function is applied periodically throughout the learning process. Policy comparison is added to monitor the state of the policy encoded in the action-value function to prevent unnecessary episodes at each level of discretization. This approach is validated on several problems including an inverted pendulum, reconfigurable airfoil, and reconfigurable wing. Results show that the multi-resolution state-space discretization method reduces the number of state-action pairs, often by an order of magnitude, required to achieve a specific goal and the policy comparison prevents unnecessary episodes once the policy has converged to a usable policy. Results also show that the genetic algorithm is a promising candidate for the selection of basis functions for function approximation of the action-value function.

机译：有许多技术可用于解决强化学习问题，但是很少有技术针对高度强化可重构系统开发和测试，这些系统被视为强化学习问题。可重配置系统是指可以更改其几何特征（即形状或构造）以执行车辆原本无法完成的任务的车辆（空中，地面或水上）或车辆集合。这些系统倾向于针对几种操作条件进行优化，然后将控制器设计为将系统从一种操作条件重新配置为另一种操作条件。 Q学习是一种无监督的情节学习技术，可以解决强化学习问题，是一种可重构系统的有吸引力的控制方法。它已成功应用于众多控制问题，并且为避免或减轻此方法的早期版本中的某些限制而开发了许多变体。本文描述了Q学习算法的三个模块化增强功能的开发，这些增强功能解决了使用此类系统时出现的一些独特问题，例如可重配置参数的复杂交互以及系统的计算密集型模型。开发了一种多分辨率状态空间离散化方法，该方法通过在状态或学习空间内一个或多个不同的关注区域周围逐渐精细的网格来自适应地重新分散状态空间。在整个学习过程中，将定期应用一种遗传算法，该算法自主选择要在作用值函数逼近中使用的基本函数。添加了策略比较，以监视在操作值函数中编码的策略的状态，以防止在离散化的每个级别出现不必要的事件。这种方法在几个问题上得到了验证，包括倒立摆，可重构翼型和可重构机翼。结果表明，多分辨率状态空间离散化方法通常可将实现特定目标所需的状态操作对数量减少一个数量级，并且一旦策略收敛到可用策略，策略比较就可以防止不必要的事件发生。结果还表明，遗传算法是为操作值函数的函数逼近选择基础函数的有希望的候选者。

著录项

作者
Lampton, Amanda Kathryn.;
展开▼
作者单位

Texas A&M University.;

展开▼
授予单位 Texas A&M University.;
学科 Engineering Aerospace.
学位 Ph.D.
年度 2009
页码 284 p.
总页数 284
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method [J] . Ajin George Joseph, Shalabh Bhatnagar Machine Learning . 2018,第8a10期

机译：交叉熵法线性函数逼近的强化学习在线预测算法
2. The ODE method for convergence of stochastic approximation and reinforcement learning [J] . Borkar VS., Meyn SP. SIAM Journal on Control and Optimization . 2000,第2期

机译：随机逼近与强化学习收敛的ODE方法
3. Adaptive optimal output feedback tracking control for unknown discrete-time linear systems using a combined reinforcement Q-learning and internal model method [J] . Control Theory & Applications, IET . 2019,第18期

机译：结合强化Q学习和内部模型方法的未知离散时间线性系统的自适应最优输出反馈跟踪控制
4. Ensemble Methods for Reinforcement Learning with Function Approximation [C] . Stefan Fausser, Friedhelm Schwenker Multiple classifier systems . 2011

机译：具有函数逼近的强化学习的集成方法
5. Model based approximation methods for reinforcement learning . [D] . Wang, Xin. 2006

机译：基于模型的强化学习近似方法。
6. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions [O] . Minija Tamosiunaite, Tamim Asfour, Florentin Wörgötter -1

机译：通过使用连续动作的基于受体场的函数逼近方法通过强化学习来学习达到
7. The ODE method for convergence of stochastic approximation and reinforcement learning [O] . Borkar VS, Meyn SP 2000

机译：随机逼近与强化学习收敛的ODE方法
8. Machine Learning Control For Highly Reconfigurable High-Order Systems. [R] . Valasek, J., Chakravorty, S. 2015

机译：高可重构高阶系统的机器学习控制。

Discretization and approximation methods for reinforcement learning of highly reconfigurable systems.

摘要

著录项

相似文献

相关主题

期刊订阅