首页> 外文期刊>Engineering Applications of Artificial Intelligence >Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality
【24h】

Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality

机译:使用深度强化学习的动态多目标优化:基准,算法和基于水质识别脆弱区域的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Dynamic multi-objective optimisation problem (DMOP) has brought a great challenge to the reinforcement learning (RL) research area due to its dynamic nature such as objective functions, constraints and problem parameters that may change over time. This study aims to identify the lacking in the existing benchmarks for multi-objective optimisation for the dynamic environment in the RL settings. Hence, a dynamic multi-objective testbed has been created which is a modified version of the conventional deep-sea treasure (DST) hunt testbed. This modified testbed fulfils the changing aspects of the dynamic environment in terms of the characteristics where the changes occur based on time. To the authors' knowledge, this is the first dynamic multi-objective testbed for RL research, especially for deep reinforcement learning. In addition to that, a generic algorithm is proposed to solve the multi-objective optimisation problem in a dynamic constrained environment that maintains equilibrium by mapping different objectives simultaneously to provide the most compromised solution that closed to the true Pareto front (PF). As a proof of concept, the developed algorithm has been implemented to build an expert system for a real-world scenario using Markov decision process to identify the vulnerable zones based on water quality resilience in Sao Paulo, Brazil. The outcome of the implementation reveals that the proposed parity-Q deep Q network (PQDQN) algorithm is an efficient way to optimise the decision in a dynamic environment. Moreover, the result shows PQDQN algorithm performs better compared to the other state-of-the-art solutions both in the simulated and the real-world scenario.
机译:动态多目标优化问题(DMOP)由于其动态性质(例如目标函数,约束和问题参数可能随时间变化)而给增强学习(RL)研究领域带来了巨大挑战。本研究旨在确定RL设置中动态环境的多目标优化的现有基准缺乏。因此,已经创建了动态多目标测试台,它是常规深海寻宝(DST)狩猎测试台的改进版本。这个经过修改的测试台可以根据时间发生变化的特征,满足动态环境不断变化的方面。据作者所知,这是用于RL研究(尤其是深度强化学习)的第一个动态多目标测试平台。除此之外,提出了一种通用算法来解决动态受限环境中的多目标优化问题,该环境通过同时映射不同目标来维持平衡,以提供最接近真实帕累托前沿(PF)的最妥协的解决方案。作为概念的证明,已开发的算法已实现,可以使用马尔可夫决策过程为基础,在巴西圣保罗基于水质适应力的基础上确定脆弱区域,从而为现实情况构建专家系统。实施结果表明,提出的奇偶校验Q深Q网络(PQDQN)算法是在动态环境中优化决策的有效方法。此外,结果表明,在仿真和实际场景中,PQDQN算法的性能均优于其他最新解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号