首页> 外文期刊>Structural Safety >Model-free reinforcement learning with model-based safe exploration: Optimizing adaptive recovery process of infrastructure systems
【24h】

Model-free reinforcement learning with model-based safe exploration: Optimizing adaptive recovery process of infrastructure systems

机译:基于模型的安全探索的无模型强化学习:优化基础设施系统的自适应恢复过程

获取原文
获取原文并翻译 | 示例
       

摘要

Extreme events represent not only some of the most damaging events in our society and environment, but also the most difficult to predict. Model-based predictions of the disruptions induced by extreme events on urban infrastructure systems are often unreliable, as these events are unlikely by their very definition. Specifically, characterizing the effect of such disruptions to the urban infrastructure using a parameterized model is a difficult task. On the other hand, model-free approaches based on recent advancements in reinforcement learning can model the complex dynamics of urban society and infrastructure under the risk of extreme events explicitly without relying on any specific physics-based mechanism. However, these approaches usually require performing random exploration of the effects of management actions on the system (typically in the post-event situation) to allow for an acceptable approximation to the optimal management policy. When dealing with costly infrastructure systems and important communities, this random exploration can be unacceptable and risky. In this paper, we propose a method called Safe Q-leaming, which is a model-free reinforcement learning approach with addition of a model-based safe exploration for near-optimal management of infrastructure system pre-event and their recovery post-event. Our method requires the decision-maker to model the structure of the state space of the problem, and a suitable equilibrium of the system (optimum functionality pre-event). This information is usually available for urban systems, as they spend long time in optimum equilibrium before the occurrence of such events. We show on several examples of infrastructure management how the proposed approach is able to achieve near-optimal performance without the risk due to random exploration.
机译:极端事件不仅代表着我们社会和环境中一些最具破坏力的事件,而且是最难以预测的事件。基于模型的预测对城市基础设施系统中的极端事件造成的破坏通常是不可靠的,因为根据它们的定义,这些事件不太可能发生。具体而言,使用参数化模型表征此类破坏对城市基础设施的影响是一项艰巨的任务。另一方面,基于强化学习最新进展的无模型方法可以明确地模拟极端事件风险下城市社会和基础设施的复杂动态,而无需依赖任何特定的基于物理的机制。但是,这些方法通常需要对管理操作对系统的影响进行随机探索(通常在事后情况中),以使最佳管理策略可以接受。当处理昂贵的基础设施系统和重要社区时,这种随机探索可能是不可接受的且有风险。在本文中,我们提出了一种称为“安全Q学习”的方法,该方法是一种无模型的强化学习方法,另外还添加了基于模型的安全探索,用于基础设施系统事前及其事后恢复的近乎最优的管理。我们的方法要求决策者对问题的状态空间的结构以及系统的适当平衡(最佳功能预事件)进行建模。这些信息通常可用于城市系统,因为它们会在此类事件发生之前花费很长时间达到最佳平衡。我们在基础设施管理的几个示例上展示了所提出的方法如何能够在没有随机探索带来的风险的情况下实现接近最佳的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号