Unmanned surface vessel obstacle avoidance with prior knowledge-based reward shaping

Wang Wei; Luo Xiangfeng; Li Yang; Xie Shaorong

首页> 外文期刊>Concurrency and computation: practice and experience >Unmanned surface vessel obstacle avoidance with prior knowledge-based reward shaping

【24h】

Unmanned surface vessel obstacle avoidance with prior knowledge-based reward shaping

机译：利用现有知识的奖励塑造，无人驾驶障碍避免避免

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相关主题

摘要

Autonomous obstacle avoidance control of unmanned surface vessels (USVs) in complex marine environments is always fundamental for its scientific search and detection. Traditional methods usually model USV motion and environments in a mathematical way that needs perceptual information. Unfortunately, it is difficult to provide sufficient perceptual information due to complex marine environments, resulting in inaccurate modeling. Reinforcement learning has recently enjoyed increasing popularity in the problem of obstacle avoidance since it can settle problems by partially observable environment information. However, the autonomous USV obstacle avoidance using reinforcement learning is still facing the challenge of designing appropriate reward functions under complex marine environments. To address these issues, we propose a prior knowledge-based USV reinforcement learning obstacle avoidance algorithm. In this algorithm, an actor-critic network is used as the main architecture of the algorithm, and prior knowledge-based reward shaping used to design relevant reward function for USV obstacle avoidance. A standard USV based on a visual sensor is designed, and the state input of the algorithm is through USV's front vision sensor. We conducted simulation experiments and results prove that our algorithm can effectively converge, and USV achieves high velocity and low collision rate in the complex marine environment.

机译：复杂海洋环境中无人面血管（USV）的自主障碍物避免控制始终是其科学搜查和检测的基础。传统方法通常以需要感知信息的数学方式模拟USV运动和环境。不幸的是，由于复杂的海洋环境，难以提供足够的感知信息，从而导致建模不准确。强化学习最近在避免障碍问题中越来越受欢迎，因为它可以通过部分可观察的环境信息解决问题。然而，使用加强学习的自主USV障碍避免仍然面临着在复杂的海洋环境下设计适当的奖励功能的挑战。为解决这些问题，我们提出了一种基于知识的USV强化学习障碍避免算法。在该算法中，演员 - 批评网络被用作算法的主要架构，以及用于设计USV障碍避免的相关奖励功能的现有知识的奖励整形。设计了一种基于可视传感器的标准USV，算法的状态输入通过USV的前视觉传感器。我们进行了仿真实验，并证明我们的算法能够有效地融合，USV在复杂的海洋环境中实现了高速度和低碰撞速率。

著录项

来源
《Concurrency and computation: practice and experience》 |2021年第9期|e6110.1-e6110.13|共13页
作者
Wang Wei; Luo Xiangfeng; Li Yang; Xie Shaorong;
展开▼
作者单位

Shanghai Univ Sch Comp Engn & Sci Shanghai Peoples R China;

Shanghai Univ Sch Comp Engn & Sci Shanghai Peoples R China;

Shanghai Univ Sch Comp Engn & Sci Shanghai Peoples R China;

Shanghai Univ Sch Comp Engn & Sci Shanghai Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
complex dynamic environment; obstacle avoidance; prior knowledge; reinforcement learning; reward shaping; unmanned surface vessel;

机译：复杂的动态环境;避免避免;先验知识;加强学习;奖励塑造;无人面的表面血管;

Unmanned surface vessel obstacle avoidance with prior knowledge-based reward shaping

摘要

著录项

引文网络

相关主题

期刊订阅