首页> 外文学位 >Using reinforcement learning to improve network durability.
【24h】

Using reinforcement learning to improve network durability.

机译:使用强化学习来提高网络持久性。

获取原文
获取原文并翻译 | 示例

摘要

Our goal is to determine and optimize the efficacy of reinforcing an existing flow network to prevent unmet demand from imminent disruptions. We are given probabilities of failures for edges in the network and are asked to find edges which will best provide durability to the network post-event. The problem is extended to multiple time steps to address concerns of available resources versus quality of installations: the farther away from the event one makes decisions the more resources are available but the less reliable the uncertainty information. This sequential decision-making process is a classic example of dynamic programming. To avoid the "curses of dimensionality", we formulate an approximate dynamic program. To improve performance, especially as applied to flow networks, we derive several innovative adaptations from reinforcement learning concepts. This involves developing a policy, a function that makes installation decisions when given current forecast information, in a two step process: policy evaluation and policy improvement.;The primary solution technique takes forecast samples from a Monte Carlo simulation in the style of stochastic programming. Once a forecast is obtained, the problem is set up by taking additional samples of the forecast probabilities to determine capacities for the given time step. This forms the state information used in performing the approximate dynamic program. The sampled outcome information is used to define network constraints for the policy improvement step. The approximation for future costs is then refined using the improved policy compared with a desirable target objective. This process is repeated over several iterations. Lastly, we provide empirical evidence which corroborates with basic theorems of convergence for more simplistic forms of the reinforcement learning process.;With a trained policy, we compare its performance against traditional two-stage stochastic programs with recourse utilizing a sample average approximation model. We consider several implementations of the stochastic problem to gauge performance in a variety of ways. The material presented here is developed in the context of preparing urban infrastructures against damages caused by disasters, however is applicable to any flow network. This paper contributes to both the field of multistage stochastic programming and approximate dynamic programming by introducing factors to each other. We also apply innovative reinforcement learning techniques to flow networks that, as of this writing, have yet to be addressed.
机译:我们的目标是确定并优化增强现有流量网络的效率,以防止未满足的需求因即将到来的中断而受到影响。我们给出了网络边缘故障的可能性,并要求我们找到最能为事件后网络提供持久性的边缘。该问题扩展到多个时间步骤,以解决对可用资源与安装质量之间的关系的担忧:离事件越远,决策就越有更多的可用资源,但不确定性信息的可靠性越低。这种顺序决策过程是动态编程的经典示例。为了避免“维数的诅咒”,我们制定了一个近似的动态程序。为了提高性能,尤其是应用于流动网络的性能,我们从强化学习概念中获得了一些创新的改编。这涉及到制定策略,该函数在两步过程中提供给定当前的预测信息,从而在安装决策时做出安装决策:策略评估和策略改进。主要解决方案技术采用随机编程的方式从蒙特卡洛模拟中获取预测样本。一旦获得了预测,就可以通过对预测概率进行其他采样来确定给定时间步长的能力来设置问题。这形成了用于执行近似动态程序的状态信息。采样的结果信息用于为策略改进步骤定义网络约束。然后,与预期的目标相比,使用改进的策略来细化未来成本的近似值。这个过程重复了几次迭代。最后,我们提供了经验性证据,以证明强化学习过程的形式更为简单。该准则与收敛的基本定理相符。通过训练有素的策略,我们使用样本平均逼近模型将其与传统的两阶段随机程序的性能进行比较。我们考虑随机问题的几种实现方式,以各种方式评估性能。这里介绍的材料是在准备城市基础设施以应对灾害造成的损害的背景下开发的,但是适用于任何流量网络。本文通过相互介绍因素,为多阶段随机规划和近似动态规划领域做出了贡献。在撰写本文时,我们还将创新的强化学习技术应用于流网络,目前尚待解决。

著录项

  • 作者

    Hammel, Erik.;

  • 作者单位

    Rensselaer Polytechnic Institute.;

  • 授予单位 Rensselaer Polytechnic Institute.;
  • 学科 Applied Mathematics.;Artificial Intelligence.;Operations Research.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 147 p.
  • 总页数 147
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号