首页> 外文学位 >High-dimensional adaptive dynamic programming with mixed integer linear programming.
【24h】

High-dimensional adaptive dynamic programming with mixed integer linear programming.

机译:具有混合整数线性规划的高维自适应动态规划。

获取原文
获取原文并翻译 | 示例

摘要

Dynamic programming (DP, Bellman 1957) is a classic mathematical programming approach to solve multistage decision problems. The "Bellman equation" uses a recursive concept that includes both the current contribution and future contribution in the objective function of an optimization. The method has potential to represent dynamic decision-making systems, but an exact DP solution algorithm is limited to small problems with restrictions, such as problems with linear transitions and problems without uncertainty. Approximate dynamic programming (ADP) is a modern branch of DP that seeks to achieve numerical solutions via approximation. It is can be applied to real-world DP problems, but there are still challenges for high dimensions. This dissertation focuses on ADP value function approximation for a continuous-state space using the statistical perspective (Chen et al. 1999). Two directions of ADP methodology are developed: a sequential algorithm to explore the state space, and a sequential algorithm using mixed integer linear programming (MILP) and regression trees.;The first component addresses exploration of the state space. A sequential state space exploration (SSSE) algorithm (Fan 2008) was developed using neural networks. Here it is considered the use of multivariate adaptive regression splines (Friedman 1991) in place of neural networks. In ADP, the value function approximation is defined over a specified state space region. In the real world, the relevant state space region is unknown. In particular, the ADP approach employed in this dissertation uses a statistical perspective that is analogous to design and analysis of computer experiments (DACE, Chen et al. 2006). In DACE, an experimental design is used to discretize the input space region, which is the state space region in ADP. Since the ADP state space region is unknown, SSSE uses the stochastic trajectories to sample future states and identify the potential range of system state. By reducing iterations without impacting solution quality, SSSE using MARS demonstrates improved efficiency over SSSE with neural networks.;The second component of this dissertation addresses the optimization of a real world, complex, dynamic system. This work is motivated by a case study on the environmental impact of aircraft deicing activities at the Dallas-Fort Worth (D/FW) International Airport. For this case study, the state transitions are nonlinear, the objective function is nonconvex, and the decision (action) space is high-dimensional and discrete. To overcome these complexities, an ADP method is introduced using a piecewise linear value function approximation and MILP. The piecewise linear structure can be transformed for MILP by introducing binary variables. The treed regression value function approximation is flexible enough to approximate the nonlinear structure of the data and structured to be easily formulated in MILP. The proposed ADP approach is compared with a reinforcement learning (Murphy 2005) approach.
机译:动态编程(DP,Bellman 1957)是解决多阶段决策问题的经典数学编程方法。 “贝尔曼方程式”使用递归概念,该递归概念包括优化目标函数中的当前贡献和将来贡献。该方法具有表示动态决策系统的潜力,但是精确的DP解算法仅限于有限制的小问题,例如线性过渡问题和没有不确定性的问题。近似动态编程(ADP)是DP的现代化分支,旨在通过逼近来实现数值解。它可以应用于现实世界中的DP问题,但是对于高尺寸仍然存在挑战。本文从统计学的角度着眼于连续状态空间的ADP值函数逼近(Chen et al。1999)。 ADP方法论的两个方向是:探索状态空间的顺序算法,以及使用混合整数线性规划(MILP)和回归树的顺序算法。第一个组件着眼于状态空间的探索。使用神经网络开发了一种顺序状态空间探索(SSSE)算法(Fan 2008)。这里考虑使用多元自适应回归样条(Friedman 1991)代替神经网络。在ADP中,值函数近似值是在指定的状态空间区域上定义的。在现实世界中,相关的状态空间区域是未知的。特别是,本文采用的ADP方法采用的统计观点类似于计算机实验的设计和分析(DACE,Chen等,2006)。在DACE中,使用实验设计来离散化输入空间区域,该区域是ADP中的状态空间区域。由于ADP状态空间区域未知,因此SSSE使用随机轨迹来采样未来状态并确定系统状态的潜在范围。通过减少迭代而不影响解决方案质量,使用MARS的SSSE证明了比使用神经网络的SSSE更高的效率。本文的第二部分解决了现实世界中复杂,动态系统的优化问题。这项工作是通过对达拉斯-沃思堡(D / FW)国际机场飞机除冰活动对环境的影响进行的案例研究而激发的。对于此案例研究,状态转换是非线性的,目标函数是非凸的,决策(动作)空间是高维且离散的。为了克服这些复杂性,使用分段线性值函数逼近和MILP引入了ADP方法。通过引入二进制变量,可以为MILP变换分段线性结构。树状回归值函数逼近足够灵活,可以逼近数据的非线性结构,并且可以很容易地用MILP表示。将拟议的ADP方法与强化学习(Murphy 2005)方法进行了比较。

著录项

  • 作者

    Zhang, Zirun.;

  • 作者单位

    The University of Texas at Arlington.;

  • 授予单位 The University of Texas at Arlington.;
  • 学科 Operations research.;Industrial engineering.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 129 p.
  • 总页数 129
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号