首页> 外国专利> PATH PLANNING METHOD AND SYSTEM BASED ON COMBINATION OF SAFETY EVACUATION SIGNS AND REINFORCEMENT LEARNING

PATH PLANNING METHOD AND SYSTEM BASED ON COMBINATION OF SAFETY EVACUATION SIGNS AND REINFORCEMENT LEARNING

机译:基于安全疏散标志与加固学习相结合的路径规划方法和系统

摘要

The present disclosure provides a path planning method and system based on a combination of safety evacuation signs and reinforcement learning. The path planning method comprises: establishing and rasterizing a two-dimensional simulation scenario model, and initializing obstacles, agents and safety evacuation signs in the two-dimensional simulation scenario model; and performing path planning in combination with the safety evacuation signs and a Q-Learning algorithm, specifically: initializing Q values corresponding to respective agents in a Q value table to 0; acquiring state information of each agent at the current moment, calculating a corresponding reward, and selecting an action having a corresponding large Q value to move each agent; calculating an instant reward of each agent moved to the new location, updating the Q value table, judging whether the Q value table converges, and if so, obtaining an optimal path sequence; otherwise, receiving and aggregating input environmental information sent by each agent and its corresponding state, action, reward and output environmental information, then distributing the aggregated information to each agent, and continuing to move each agent.
机译:本公开提供了基于安全疏散标志和强化学习的组合的路径规划方法和系统。该路径规划方法包括:建立并栅格化二维仿真场景模型,并初始化二维仿真场景模型中的障碍物,主体和安全疏散标志。结合安全疏散标志和Q-Learning算法进行路径规划,具体是:将Q值表中各个代理对应的Q值初始化为0;获取当前时刻每个特工的状态信息,计算相应的报酬,并选择具有较大Q值的动作来移动每个特工。计算移动到新位置的每个代理的即时奖励,更新Q值表,判断Q值表是否收敛,如果是,则获得最佳路径序列;否则,接收并汇总每个代理发送的输入环境信息及其相应的状态,动作,奖励和输出环境信息,然后将聚合的信息分发给每个代理,并继续移动每个代理。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号