Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality

Hasan Md Mahmudul; Lwin Khin; Imani Maryam; Shabut Antesar; Bittencourt Luiz Fernando; Hossain M. A.

首页> 外文期刊>Engineering Applications of Artificial Intelligence >Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality

【24h】

Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality

机译：使用深度强化学习的动态多目标优化：基准，算法和基于水质识别脆弱区域的应用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Dynamic multi-objective optimisation problem (DMOP) has brought a great challenge to the reinforcement learning (RL) research area due to its dynamic nature such as objective functions, constraints and problem parameters that may change over time. This study aims to identify the lacking in the existing benchmarks for multi-objective optimisation for the dynamic environment in the RL settings. Hence, a dynamic multi-objective testbed has been created which is a modified version of the conventional deep-sea treasure (DST) hunt testbed. This modified testbed fulfils the changing aspects of the dynamic environment in terms of the characteristics where the changes occur based on time. To the authors' knowledge, this is the first dynamic multi-objective testbed for RL research, especially for deep reinforcement learning. In addition to that, a generic algorithm is proposed to solve the multi-objective optimisation problem in a dynamic constrained environment that maintains equilibrium by mapping different objectives simultaneously to provide the most compromised solution that closed to the true Pareto front (PF). As a proof of concept, the developed algorithm has been implemented to build an expert system for a real-world scenario using Markov decision process to identify the vulnerable zones based on water quality resilience in Sao Paulo, Brazil. The outcome of the implementation reveals that the proposed parity-Q deep Q network (PQDQN) algorithm is an efficient way to optimise the decision in a dynamic environment. Moreover, the result shows PQDQN algorithm performs better compared to the other state-of-the-art solutions both in the simulated and the real-world scenario.

机译：动态多目标优化问题（DMOP）由于其动态性质（例如目标函数，约束和问题参数可能随时间变化）而给增强学习（RL）研究领域带来了巨大挑战。本研究旨在确定RL设置中动态环境的多目标优化的现有基准缺乏。因此，已经创建了动态多目标测试台，它是常规深海寻宝（DST）狩猎测试台的改进版本。这个经过修改的测试台可以根据时间发生变化的特征，满足动态环境不断变化的方面。据作者所知，这是用于RL研究（尤其是深度强化学习）的第一个动态多目标测试平台。除此之外，提出了一种通用算法来解决动态受限环境中的多目标优化问题，该环境通过同时映射不同目标来维持平衡，以提供最接近真实帕累托前沿（PF）的最妥协的解决方案。作为概念的证明，已开发的算法已实现，可以使用马尔可夫决策过程为基础，在巴西圣保罗基于水质适应力的基础上确定脆弱区域，从而为现实情况构建专家系统。实施结果表明，提出的奇偶校验Q深Q网络（PQDQN）算法是在动态环境中优化决策的有效方法。此外，结果表明，在仿真和实际场景中，PQDQN算法的性能均优于其他最新解决方案。

著录项

来源
《Engineering Applications of Artificial Intelligence》 |2019年第11期|107-135|共29页
作者
Hasan Md Mahmudul; Lwin Khin; Imani Maryam; Shabut Antesar; Bittencourt Luiz Fernando; Hossain M. A.;
展开▼
作者单位

Anglia Ruskin Univ Anglia Ruskin IT Res Inst Chelmsford CM11SQ England;

Teesside Univ Sch Comp & Digital Technol Middlesbrough TS1 3BX Cleveland England;

Anglia Ruskin Univ Water Syst Engn & Built Environm Chelmsford CM11SQ England;

Leeds Trinity Univ Sch Arts & Commun Leeds LS18 5HD W Yorkshire England;

Univ Estadual Campinas Inst Comp Comp Networks Lab BR-13083852 Campinas SP Brazil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Dynamic environment; Reinforcement learning; Deep Q network; Water quality resilience; Meta-policy selection; Artificial intelligence;

机译：动态环境;强化学习;深度Q网络水质适应力;元政策选择;人工智能;

相似文献

外文文献
中文文献
专利

1. An automatic algorithm of identifying vulnerable spots of internet data center power systems based on reinforcement learning [J] . Kang Chunjian, Huang Jianwen, Zhang Zhang, International journal of electrical power and energy systems . 2020,第Octa期

机译：基于加固学习的互联网数据中心电力系统易受攻击的自动算法
2. Benchmarks for Dynamic Multi-Objective Optimisation Algorithms [J] . MARDE HELBIG, ANDRIES P. ENGELBRECHT ACM Computing Surveys . 2014,第3期

机译：动态多目标优化算法的基准
3. Dynamic Beam Hopping Method Based on Multi-Objective Deep Reinforcement Learning for Next Generation Satellite Broadband Systems [J] . Hu Xin, Zhang Yuchen, Liao Xianglai, IEEE Transactions on Broadcasting . 2020,第3期

机译：基于多目标深增强学习的下一代卫星宽带系统的动态梁跳跃方法
4. Benchmarking Uncertainty Estimates with Deep Reinforcement Learning for Dialogue Policy Optimisation [C] . Christopher Tegho, Pawel Budzianowski, Milica Gasic IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：对话政策优化的深度加固学习基准测试估算
5. Faculty members' perceptions regarding importance and application of identified benchmark indicators for quality of Internet-based nursing outcomes. [D] . Elebiary, Hoda Aly. 2005

机译：教师对于已确定的基于互联网的护理成果质量的基准指标的重要性和应用的看法。
6. Is Deep Reinforcement Learning Ready for Practical Applications in Healthcare? A Sensitivity Analysis of Duel-DDQN for Hemodynamic Management in Sepsis Patients [O] . MingYu Lu, Zachary Shahn, Daby Sow, 2020

机译：深增强学习是否准备用于医疗保健的实际应用？脓毒症患者血流动力学管理的DUEL-DDQN敏感性分析
7. Benchmarks for dynamic multi-objective optimisation algorithms [O] . Helbig, Marde, Engelbrecht, Andries P. 2014

机译：动态多目标优化算法的基准
8. Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications. [R] . Ferreria, P. V. R., Paffenroth, R., Wyglinski, A. M., 2017

机译：基于多目标强化学习的认知空间通信深度神经网络。

Dynamic multi-objective optimisation using deep reinforcement learning: benchmark, algorithm and an application to identify vulnerable zones based on water quality

摘要

著录项

相似文献

相关主题

期刊订阅