首页> 外文学位 >A hybrid genetic/optimization algorithm for piecewise affine and convex Markov decision processes.
【24h】

A hybrid genetic/optimization algorithm for piecewise affine and convex Markov decision processes.

机译:分段仿射和凸马尔可夫决策过程的混合遗传/优化算法。

获取原文
获取原文并翻译 | 示例

摘要

A variety of finite horizon models of sequential decision making under uncertainty have value functions that are piecewise affine and convex ( PAC) or equivalently, can be represented by a finite set of affine functions. Such models include: the partially observed Markov decision problem (POMDP) and the Markov decision problem (MDP) with affine single-step and/or terminal reward function, examples of which are the multi-objective MDP and the nonhomogeneous MDP (NMDP). We develop both optimal and suboptimal solution procedures for this class of models. Our optimal solution approach, GAMIP, integrates a genetic algorithm (GA) and a mixed integer program (MIP). The genetic algorithm step only is a suboptimal design with good solution quality. Both optimal approach and suboptimal design construct the minimal set of affine functions that describe the value function. The POMDP and NMDP are investigated numerically.; The POMDP is a generalization of a MDP that allows for observations that are probabilistically related to the underlying system state. The value function of the finite horizon POMDP is known to be PAC in the probability mass vector over the state space. For suboptimal design, we develop two heuristic approaches based on genetic algorithms, termed as dc-niche and spatial, to construct approximations of the minimal set of affine functions that describes the value function. Error bounds of these two approaches are presented. Numerical results indicate that these heuristics can quickly yield high quality solutions for a set of small-scale problems where exact solutions are feasible and can provide suboptimal designs for realistically sized problems. Numerical results indicate that GAMIP takes up to 60% less time than the most efficient linear programming-based exact solution method in the literature to construct the minimal set.; For the study of an infinite horizon NMDP, the objective is to determine an optimal initial action. We transform this infinite horizon NMDP into a finite horizon NMDP having a terminal reward vector described by set inclusion. Our approach involves finding a forecast horizon, a problem horizon such that an optimal initial decision for the finite horizon problem is optimal for the infinite horizon problem. We make use of the fact that the value function of the finite horizon NMDP having a terminal reward vector described by set inclusion is PAC in the terminal reward vector. Our solution procedure is based on a combination of the genetic algorithm and mixed integer programming. We show that a suboptimal procedure based solely on the genetic algorithm quickly identifies high quality candidates for the minimal forecast horizon. The mixed integer program is used to confirm or rule out a candidate for a forecast horizon. We then present a numerical evaluation that compares our approach with several other algorithms.
机译:不确定条件下的顺序决策的各种有限视野模型具有分段仿射和凸( PAC )或等效的值函数,可以由有限组仿射函数表示。此类模型包括:具有仿射单步和/或最终奖励功能的部分观测的Markov决策问题( POMDP )和Markov决策问题( MDP ),其示例是多目标 MDP 和非同质 MDP NMDP )。我们为此类模型开发了最优和次优的求解程序。我们的最佳解决方案 GAMIP 集成了遗传算法( GA )和混合整数程序( MIP )。遗传算法步骤只是解决方案质量较差的次优设计。最优方法和次优设计都构成了描述值函数的最小仿射函数集。数值研究了 POMDP NMDP POMDP MDP 的概括,它允许与基础系统状态概率相关的观察。在状态空间上的概率质量向量中,有限水平的 POMDP 的值函数已知为 PAC 。对于次优设计,我们基于遗传算法开发了两种启发式方法,分别称为dc小生境和空间算法,以构造描述值函数的最小仿射函数集的近似值。介绍了这两种方法的误差范围。数值结果表明,这些启发式方法可以针对一组小规模问题快速生成高质量的解决方案,其中精确的解决方案是可行的,并且可以为实际大小的问题提供次优设计。数值结果表明, GAMIP 花费的时间比文献中最有效的基于线性规划的精确解方法构造最小集所需的时间少60%。为了研究无限水平的 NMDP ,目标是确定最佳的初始动作。我们将此无限视野 NMDP 转换为有限视野 NMDP ,其具有通过集合包含描述的最终奖励向量。我们的方法涉及找到一个预测范围,即一个问题范围,从而使有限范围问题的最优初始决策对于无限范围问题是最优的。我们利用以下事实:在最终奖励向量中,具有通过集合包含描述的最终奖励向量的有限范围 NMDP 的值函数为 PAC 。我们的解决方案基于遗传算法和混合整数编程。我们表明,仅基于遗传算法的次优过程可以快速识别出最低预测范围的高质量候选对象。混合整数程序用于确认或排除预测范围的候选对象。然后,我们提出了一个数值评估,将我们的方法与其他几种算法进行了比较。

著录项

  • 作者

    Lin, Zong-Zhi.;

  • 作者单位

    University of Michigan.;

  • 授予单位 University of Michigan.;
  • 学科 Operations Research.; Computer Science.
  • 学位 Ph.D.
  • 年度 1999
  • 页码 101 p.
  • 总页数 101
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 运筹学;自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号