首页> 外文学位 >Discretization and approximation methods for reinforcement learning of highly reconfigurable systems.
【24h】

Discretization and approximation methods for reinforcement learning of highly reconfigurable systems.

机译:离散化和逼近方法,用于高度可重构系统的强化学习。

获取原文
获取原文并翻译 | 示例

摘要

There are a number of techniques that are used to solve reinforcement learning problems, but very few that have been developed for and tested on highly reconfigurable systems cast as reinforcement learning problems. Reconfigurable systems refers to a vehicle (air, ground, or water) or collection of vehicles that can change its geometrical features, i.e. shape or formation, to perform tasks that the vehicle could not otherwise accomplish. These systems tend to be optimized for several operating conditions, and then controllers are designed to reconfigure the system from one operating condition to another. Q-learning, an unsupervised episodic learning technique that solves the reinforcement learning problem, is an attractive control methodology for reconfigurable systems. It has been successfully applied to a myriad of control problems, and there are a number of variations that were developed to avoid or alleviate some limitations in earlier version of this approach. This dissertation describes the development of three modular enhancements to the Q-learning algorithm that solve some of the unique problems that arise when working with this class of systems, such as the complex interaction of reconfigurable parameters and computationally intensive models of the systems. A multi-resolution state-space discretization method is developed that adaptively rediscretizes the state-space by progressively finer grids around one or more distinct Regions Of Interest within the state or learning space. A genetic algorithm that autonomously selects the basis functions to be used in the approximation of the action-value function is applied periodically throughout the learning process. Policy comparison is added to monitor the state of the policy encoded in the action-value function to prevent unnecessary episodes at each level of discretization. This approach is validated on several problems including an inverted pendulum, reconfigurable airfoil, and reconfigurable wing. Results show that the multi-resolution state-space discretization method reduces the number of state-action pairs, often by an order of magnitude, required to achieve a specific goal and the policy comparison prevents unnecessary episodes once the policy has converged to a usable policy. Results also show that the genetic algorithm is a promising candidate for the selection of basis functions for function approximation of the action-value function.
机译:有许多技术可用于解决强化学习问题,但是很少有技术针对高度强化可重构系统开发和测试,这些系统被视为强化学习问题。可重配置系统是指可以更改其几何特征(即形状或构造)以执行车辆原本无法完成的任务的车辆(空中,地面或水上)或车辆集合。这些系统倾向于针对几种操作条件进行优化,然后将控制器设计为将系统从一种操作条件重新配置为另一种操作条件。 Q学习是一种无监督的情节学习技术,可以解决强化学习问题,是一种可重构系统的有吸引力的控制方法。它已成功应用于众多控制问题,并且为避免或减轻此方法的早期版本中的某些限制而开发了许多变体。本文描述了Q学习算法的三个模块化增强功能的开发,这些增强功能解决了使用此类系统时出现的一些独特问题,例如可重配置参数的复杂交互以及系统的计算密集型模型。开发了一种多分辨率状态空间离散化方法,该方法通过在状态或学习空间内一个或多个不同的关注区域周围逐渐精细的网格来自适应地重新分散状态空间。在整个学习过程中,将定期应用一种遗传算法,该算法自主选择要在作用值函数逼近中使用的基本函数。添加了策略比较,以监视在操作值函数中编码的策略的状态,以防止在离散化的每个级别出现不必要的事件。这种方法在几个问题上得到了验证,包括倒立摆,可重构翼型和可重构机翼。结果表明,多分辨率状态空间离散化方法通常可将实现特定目标所需的状态操作对数量减少一个数量级,并且一旦策略收敛到可用策略,策略比较就可以防止不必要的事件发生。结果还表明,遗传算法是为操作值函数的函数逼近选择基础函数的有希望的候选者。

著录项

  • 作者

    Lampton, Amanda Kathryn.;

  • 作者单位

    Texas A&M University.;

  • 授予单位 Texas A&M University.;
  • 学科 Engineering Aerospace.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 284 p.
  • 总页数 284
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号