Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

Lee JM; Kaisare NS; Lee JH

首页> 外文期刊>Journal of Process Control >Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

【24h】

Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

机译：基于近似动态规划的控制方法的近似器选择和罚函数设计

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper investigates the choice of function approximator for an approximate dynamic programming (ADP) based control strategy. The ADP strategy allows the user to derive an improved control policy given a simulation model and some starting control policy (or alternatively, closed-loop identification data), while circumventing the 'curse-of-dimensionality' of the traditional dynamic programming approach. In ADP, one fits a function approximator to state vs. 'cost-to-go' data and solves the Bellman equation with the approximator in an iterative manner. A proper choice and design of function approximator is critical for convergence of the iteration and the quality of final learned control policy, because an approximation error can grow quickly in the loop of optimization and function approximation. Typical classes of approximators used in related approaches are parameterized global approximators (e.g. artificial neural networks) and nonparametric local averagers (e.g. k-nearest neighbor). In this paper, we assert on the basis of some case studies and a theoretical result that a certain type of local averagers should be preferred over global approximators as the former ensures monotonic convergence of the iteration. However, a converged cost-to-go function does not necessarily lead to a stable control policy on-line due to the problem of over-extrapolation. To cope with this difficulty, we propose that a penalty term be included in the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low. A nonparametric density estimator, which can be naturally combined with a local averager, is employed for this purpose. (c) 2005 Elsevier Ltd. All rights reserved.

机译：本文研究了基于近似动态规划（ADP）的控制策略的函数逼近器的选择。 ADP策略允许用户在给定模拟模型和一些启动控制策略（或闭环识别数据）的情况下，得出改进的控制策略，同时规避传统动态编程方法的“维数诅咒”。在ADP中，可以拟合一个函数逼近器来陈述与“待销成本”数据之间的关系，并以逼近器的方式用逼近器求解Bellman方程。函数逼近器的正确选择和设计对于迭代的收敛和最终学习控制策略的质量至关重要，因为在优化和函数逼近的循环中，逼近误差可能会迅速增长。相关方法中使用的典型逼近器类别是参数化的全局逼近器（例如人工神经网络）和非参数局部平均器（例如k最近邻）。在本文中，我们根据一些案例研究和理论结果断言，与全局逼近器相比，应首选某种类型的局部平均器，因为前者可确保迭代的单调收敛。但是，由于过度外推的问题，会聚成本函数功能并不一定会导致稳定的在线控制策略。为了解决此难题，我们建议在每次最小化中将惩罚项包含在目标函数中，以阻止优化器在状态数据区域中局部数据密度不足的状态中找到解决方案。为此，可以使用可以与本地平均器自然组合的非参数密度估计器。（c）2005 Elsevier Ltd.保留所有权利。

著录项

来源
《Journal of Process Control》 |2006年第2期|共22页
作者
Lee JM; Kaisare NS; Lee JH;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术及设备;
关键词
approximate dynamic programming; k-nearest neighbor; neural network; NONLINEAR PROCESSES; DENSITY;

机译：近似动态规划;k近邻;神经网络;非线性过程;密度;

相似文献

外文文献
中文文献
专利

1. Choice of approximator and design of penalty function for an approximate dynamic programming based control approach [J] . Lee JM, Kaisare NS, Lee JH Journal of Process Control . 2006,第2期

机译：基于近似动态规划的控制方法的近似器选择和罚函数设计
2. Approximate Dynamic Programming via Penalty Functions * * This research was partially funded by the European Commission under the project Local4Global. [J] . Paul N. Beuchat, John Lygeros IFAC PapersOnLine . 2017,第1期

机译：通过惩罚函数进行近似动态编程 * * 这项研究部分由欧盟委员会在Local4Global项目下资助。
3. THE CONVERGENCE OF APPROACH PENALTY FUNCTION METHOD FOR APPROXIMATE BILEVEL PROGRAMMING PROBLEM [J] . Wan zhongping, Zhou Shumin Acta Mathematica Scientia . 2001,第1期

机译：近似二层规划逼近罚函数法的收敛性。
4. Approximate Dynamic Programming via Penalty Functions [C] . Paul N. Beuchat, John Lygeros IFAC World Congress . 2018

机译：通过惩罚功能近似动态编程
5. Optimal Control of Non-Conventional Queueing Networks: A Simulation-Based Approximate Dynamic Programming Approach. [D] . Chen, Xiaoting. 2015

机译：非常规排队网络的最优控制：一种基于仿真的近似动态规划方法。
6. Optimal Control Strategy Design Based on Dynamic Programming for a Dual-Motor Coupling-Propulsion System [O] . Shuo Zhang, Chengning Zhang, Guangwei Han, -1

机译：基于动态规划的双电动机耦合推进系统最优控制策略设计
7. Train Timetable Design for Shared Railway Systems using a Linear Programming Approach to Approximate Dynamic Programming [O] . Pena-Alcaraz Maite, Webster Mort, Ramos Andres 2014

机译：利用线性规划方法进行近似动态规划的铁路共用系统列车时刻表设计

Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

摘要

著录项

相似文献

相关主题

期刊订阅