首页> 外文会议>International Conference on Dynamic Data Driven Applications Systems >Physics-Driven Machine Learning for Time-Optimal Path Planning in Stochastic Dynamic Flows
【24h】

Physics-Driven Machine Learning for Time-Optimal Path Planning in Stochastic Dynamic Flows

机译:物理驱动的机器学习随机动态流动中的时间最佳路径规划

获取原文

摘要

Optimal path planning of autonomous marine agents is important to minimize operational costs of ocean observation systems. Within the context of DDDAS, we present a Reinforcement Learning (RL) framework for computing a dynamically adaptable policy that minimizes expected travel time of autonomous vehicles between two points in stochastic dynamic flows. To forecast the stochastic dynamic environment, we utilize the reduced order data-driven dynamically orthogonal (DO) equations. For planning, a novel physics-driven online Q-learning is developed. First, the distribution of exact time optimal paths predicted by stochastic DO Hamilton-Jacobi level set partial differential equations are utilized to initialize the action value function (Q-value) in a transfer learning approach. Next, the flow data collected by onboard sensors are utilized in a feedback loop to adaptively refine the optimal policy. For the adaptation, a simple Bayesian estimate of the environment is performed (the DDDAS data assimilation loop) and the inferred environment is used to update the Q-values in an ε-greedy exploration approach (the RL step). To validate our Q-learning solution, we compare it with a fully offline, dynamic programming solution of the Markov Decision Problem corresponding to the RL framework. For this, novel numerical schemes to efficiently utilize the DO forecasts are derived and computationally efficient GPU-implementation is completed. We showcase the new RL algorithm and elucidate its computational advantages by planning paths in a stochastic quasi-geostrophic double gyre circulation.
机译:自主海洋代理的最佳路径规划对于最大限度地减少海洋观测系统的运营成本非常重要。在DDDA的背景下,我们提出了一种用于计算动态可适应的政策的加强学习(RL)框架,其最小化了随机动态流动中的两个点之间的自动车辆的预期行程时间。为了预测随机动态环境,我们利用减少的订单数据驱动动态正交(DO)方程。对于规划,开发了一种新的物理驱动的在线Q-Learning。首先,利用随机预测的确切时间的分布哈密尔顿 - Jacobi级别集部分微分方程来初始化转移学习方法中的动作值函数(Q值)。接下来,在反馈回路中使用由板载传感器收集的流数据以自适应地改进最佳策略。对于自适应,执行环境的简单贝叶斯估计(DDDA数据同化循环)和推断的环境用于更新ε-贪婪探索方法(RL步骤)中的Q值。为了验证我们的Q学习解决方案,我们将其与与RL框架相对应的Markov决策问题完全脱机,动态编程解决方案。为此,导出了有效利用DO预测的新型数值方案,并完成了计算有效的GPU实现。我们展示了新的RL算法,并通过计划随机拟地球滴性双沟循环中的路径来阐明其计算优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号