Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning

机译：通过加固学习实现空间情境意识的动态传感器任务

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Conference Paper This paper studies the Sensor Management (SM) problem for optical Space Object (SO) tracking. The tasking problem is formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). The RL problem is solved using the actor-critic policy gradient approach. The actor provides a policy which is random over actions and given by a parametric probability density function (pdf). The critic evaluates the policy by calculating the estimated total reward or the value function for the problem. The parameters of the policy action pdf are optimized using gradients with respect to the reward function. Both the critic and the actor are modeled using deep neural networks (multi-layer neural networks). The policy neural network takes the current state as input and outputs probabilities for each possible action. This policy is random, and can be evaluated by sampling random actions using the probabilities determined by the policy neural network's outputs. The critic approximates the total reward using a neural network. The estimated total reward is used to approximate the gradient of the policy network with respect to the network parameters. This approach is used to find the non-myopic optimal policy for tasking optical sensors to estimate SO orbits. The reward function is based on reducing the uncertainty for the overall catalog to below a user specified uncertainty threshold. This work uses a 30 km total position error for the uncertainty threshold. This work provides the RL method with a negative reward as long as any SO has a total position error above the uncertainty threshold. This penalizes policies that take longer to achieve the desired accuracy. A positive reward is provided when all SOs are below the catalog uncertainty threshold. An optimal policy is sought that takes actions to achieve the desired catalog uncertainty in minimum time. This work trains the policy in simulation by letting it task a single sensor to "learn" from its performance. The proposed approach for the SM problem is tested in simulation and good performance is found using the actor-critic policy gradient method.

机译：会议论文本文研究了光学空间对象（SO）跟踪的传感器管理（SM）问题。任务问题被制定为马尔可夫决策过程（MDP），并使用加强学习（RL）解决。使用演员批评政策梯度方法解决了RL问题。 actor提供了一种策略，该策略是随机的动作，并由参数概率密度函数（PDF）给出。评论家通过计算估计的总奖励或问题的价值函数来评估策略。策略动作PDF的参数使用渐变相对于奖励函数进行优化。评论家和演员都是使用深神经网络（多层神经网络）进行建模的。策略神经网络将当前状态作为输入和输出概率进行每个可能的动作。此策略是随机的，可以通过使用策略神经网络输出确定的概率进行采样随机操作来评估。评论家近似于使用神经网络的总奖励。估计的总奖励用于近似于对网络参数的策略网络的梯度。这种方法用于找到用于估计所以轨道的任务光学传感器的非近视最佳策略。奖励函数是基于降低整体目录的不确定性，以低于用户指定的不确定性阈值。这项工作使用30公里的总位置误差以获得不确定性阈值。这项工作提供了具有负奖励的RL方法，只要任何所以具有高于不确定阈值的总位置误差。这惩罚了更长时间才能达到所需准确性的政策。当所有SOS低于目录不确定性阈值时，提供了积极奖励。寻求最佳政策，以便在最短时间内实现所需的目录不确定性。这项工作通过让单个传感器从其性能“学习”来训练模拟中的策略。在模拟和良好的性能下测试了SM问题的所提出的方法，使用演员 - 批评政策梯度方法找到了良好的性能。

著录项

来源
《Advanced Maui Optical and Space Surveillance Technologies Conference》|2017年|775 p. :|共10页
会议地点
作者
Richard Linares; Roberto Furfaro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 O43-532;
关键词
MDP; SM; RL;

机译：MDP;SM;RL;
入库时间 2022-08-20 23:23:29

相似文献

外文文献
中文文献
专利

1. Space Situational Awareness Sensor Tasking: Comparison of Machine Learning with Classical Optimization Methods [J] . Bryan D. Little, Carolin E. Frueh Journal of guidance, control, and dynamics . 2020,第2期

机译：空间态势感知传感器任务分配：机器学习与经典优化方法的比较
2. Energy-aware task scheduling by a true online reinforcement learning in wireless sensor networks [J] . Khan Muhidul Islam, Xia Kewen, Ali Ahmad, International Journal of Sensor Networks . 2017,第4期

机译：无线传感器网络中真正的在线强化学习的能量感知任务调度
3. Mutual Information Based Sensor Tasking with Applications to Space Situational Awareness [J] . Nagavenkat Adurthi, Puneet Singla, Manoranjan Majji Journal of guidance, control, and dynamics . 2020,第4期

机译：基于互信息的传感器任务分配及其在空间态势感知中的应用
4. Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning [C] . Richard Linares, Roberto Furfaro Advanced Maui Optical and Space Surveillance Technologies Conference . 2017

机译：通过加固学习实现空间情境意识的动态传感器任务
5. Autonomous Sensor Tasking for Space Situational Awareness Using Deep Reinforcement Learning [D] . Jones, Quintina R. 2018

机译：利用深增强学习的空间情境意识的自主传感器任务
6. An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks [O] . Ibrahim Mustapha, Borhanuddin Mohd Ali, Mohd Fadlee A. Rasid, 2015

机译：认知无线电传感器网络的一种基于能效频谱感知增强学习的聚类算法
7. Argus: Smartphone-Enabled Human Cooperation via Multi-agent Reinforcement Learning for Disaster Situational Awareness [O] . Vidyasagar Sadhu, Gabriel Salles-Loustau, Dario Pompili, 2016

机译：ARGUS：智能手机 - 通过多功能加固学习实现灾害情境意识的能力

Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅