The optimization problem of persistent coverage for a target region by using unmanned aerial vehicles (UAVs) is addressed in this study. A deep reinforcement learning algorithm (DRL) based on bidirectional recurrent neural networks (BRNN) is proposed to obtain the optimal control output policy of UAVs which manipulate the UAVs to periodically cover the whole target region and to minimize the maximum age of cells. The UAVs coordinate autonomously by using wonderful life utility (WLU) functions and BRNN. Because all control policies share parameters, the algorithm has strong robustness and scalability which enable individual UAV to freely join or leave the task without affecting the operation of the entire system. The algorithm uses consistent outputs to control multiple heterogeneous UAVs. Simulation results are given to illustrate the effectiveness of the proposed method.(c) 2022 Elsevier B.V. All rights reserved.
展开▼