首页> 外文会议>IEEE/AIAA Digital Avionics Systems Conference >Policy Optimization in Automated Point Merge Trajectory Planning: An Artificial Intelligence-based Approach
【24h】

Policy Optimization in Automated Point Merge Trajectory Planning: An Artificial Intelligence-based Approach

机译:自动点合并轨迹规划中的政策优化:基于人工智能的方法

获取原文

摘要

Air Traffic Management (ATM) is a complex decision-making process. Air traffic controllers' decision on aircraft trajectory control actions directly leads to the efficiency of traffic flow management. In the Automated Point Merge Trajectory Planning (APMTP) problem, it aims to realize an automated routine trajectory management in Terminal Manoeuvring Area (TMA) with an intelligent decision-making agent. An Artificial Intelligence-based approach, mainly Reinforcement Learning (RL) algorithm, is applied to adaptively and smartly integrate four types of de-conflict actions for solving conflicts with fewer delays on the environment. In this paper, we will mainly discuss the policy optimization in APMTP, focus on improving the agent's learning quality and exploration efficiency. Firstly, application of RL in adaptive trajectory planning is presented. APMTP problem is adaptively divided into several sub-problems. For each sub-problem, an online policy π is applied to guide the simulation and optimization modules to find out the conflict-free and less-delay solution. The online policy π is a scale of weight distribution for choosing desirable actions. It follows the rule of Roulette-wheel selection with weighted probability. The highest desirable decision variable has the largest share of the roulette wheel, while the lowest desirable decision variable has the smallest share of the roulette wheel. The RL direct policy optimization algorithm is designed to update the online policy π, Finally, experiments are built up for validation of the proposed policy optimization algorithm for the intelligent decision-making in APMTP. The results in the test environment show that learning agent with different exploration and exploitation ability will result in different system performance in conflict resolution and delay.
机译:空中交通管理(ATM)是一个复杂的决策过程。空中交通控制器对飞机轨迹控制行动的决定直接导致交通流管理的效率。在自动点合并轨迹规划(APMTP)问题中,它旨在通过智能决策代理实现终端操纵区域(TMA)中的自动常规轨迹管理。基于人工智能的方法,主要是加强学习(RL)算法应用于自适应,并自适应地整合四种类型的脱冲突动作,以解决环境的延迟较少的冲突。在本文中,我们将主要讨论APMTP中的政策优化,重点是提高代理商的学习质量和勘探效率。首先,介绍了RL在自适应轨迹规划中的应用。 APMTP问题是自适应地分为几个子问题。对于每个子问题,应用在线策略π指导模拟和优化模块,以找出无冲突和较少延迟的解决方案。在线策略π是选择所需动作的权重分配规模。它遵循Roulette-Wheel选择具有加权概率的规则。最高理想的决策变量具有轮盘赌轮的最大份额,而最低期望的决策变量具有轮盘赌轮的最小份额。 RL直接策略优化算法旨在更新在线策略π,最后,建立实验以验证APMTP中智能决策的提议策略优化算法。测试环境中的结果表明,具有不同勘探和开发能力的学习代理将导致冲突解决和延迟的不同系统性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号