首页> 外文会议>IEEE/AIAA Digital Avionics Systems Conference >Policy Optimization in Automated Point Merge Trajectory Planning: An Artificial Intelligence-based Approach
【24h】

Policy Optimization in Automated Point Merge Trajectory Planning: An Artificial Intelligence-based Approach

机译:自动点合并轨迹规划中的策略优化:基于人工智能的方法

获取原文

摘要

Air Traffic Management (ATM) is a complex decision-making process. Air traffic controllers' decision on aircraft trajectory control actions directly leads to the efficiency of traffic flow management. In the Automated Point Merge Trajectory Planning (APMTP) problem, it aims to realize an automated routine trajectory management in Terminal Manoeuvring Area (TMA) with an intelligent decision-making agent. An Artificial Intelligence-based approach, mainly Reinforcement Learning (RL) algorithm, is applied to adaptively and smartly integrate four types of de-conflict actions for solving conflicts with fewer delays on the environment. In this paper, we will mainly discuss the policy optimization in APMTP, focus on improving the agent's learning quality and exploration efficiency. Firstly, application of RL in adaptive trajectory planning is presented. APMTP problem is adaptively divided into several sub-problems. For each sub-problem, an online policy π is applied to guide the simulation and optimization modules to find out the conflict-free and less-delay solution. The online policy π is a scale of weight distribution for choosing desirable actions. It follows the rule of Roulette-wheel selection with weighted probability. The highest desirable decision variable has the largest share of the roulette wheel, while the lowest desirable decision variable has the smallest share of the roulette wheel. The RL direct policy optimization algorithm is designed to update the online policy π, Finally, experiments are built up for validation of the proposed policy optimization algorithm for the intelligent decision-making in APMTP. The results in the test environment show that learning agent with different exploration and exploitation ability will result in different system performance in conflict resolution and delay.
机译:空中交通管理(ATM)是一个复杂的决策过程。空中交通管制员对飞机轨迹控制行为的决策直接导致了交通流管理的效率。在自动点合并轨迹规划(APMTP)问题中,其目的是通过智能决策代理在终端机动区(TMA)中实现自动化的常规轨迹管理。基于人工智能的方法(主要是强化学习(RL)算法)被应用于自适应和智能地集成四种类型的去冲突动作,以减少对环境的延迟来解决冲突。在本文中,我们将主要讨论APMTP中的策略优化,重点是提高代理的学习质量和探索效率。首先,提出了RL在自适应轨迹规划中的应用。 APMTP问题被自适应地分为几个子问题。对于每个子问题,都会应用在线策略π来指导仿真和优化模块,以找出无冲突且延迟较少的解决方案。在线策略π是用于选择所需动作的权重分配比例。它遵循带有加权概率的轮盘赌轮选择规则。期望值最高的决策变量在轮盘中的份额最大,而期望值最低的决策变量在轮盘中的份额最小。设计了RL直接策略优化算法来更新在线策略π,最后,为验证所提出的用于APMTP中智能决策的策略优化算法进行了实验。测试环境中的结果表明,具有不同探索和开发能力的学习代理在冲突解决和延迟方面将导致不同的系统性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号