首页> 外文会议>IAF Astrodynamics Symposium;International Astronautical Congress >Reinforcement Learning for Spacecraft Attitude Control
【24h】

Reinforcement Learning for Spacecraft Attitude Control

机译:加强航天器态度控制学习

获取原文
获取外文期刊封面目录资料

摘要

Reinforcement learning (RL) has recently shown promise in solving difficult numerical problems and has discovered non-intuitive solutions to existing problems. This study investigates the ability of a general RL agent to find an optimal control strategy for spacecraft attitude control problems. Two main types of Attitude Control Systems (ACS) are presented. First, the general ACS problem with full actuation is considered, but with saturation constraints on the applied torques, representing thrustcr-based ACSs. Second, an attitude control problem with reaction wheel based ACS is considered, which has more constraints on control authority. The agent is trained using the Proximal Policy Optimization (PPO) R L method to obtain an attitude control policy. To ensure robustness, the inertia of the satellite is unknown to the control agent, and is randomized for each simulation. To achieve efficient learning, the agent, is trained using curriculum learning. We compare the RL based controller to a QRF (quaternion rate feedback) attitude controller, a well-established state feedback control strategy. Wc investigate the nominal performance and robustness with respect to uncertainty in system dynamics. Our RL based attitude control agent adapts to any spacecraft mass without needing to re-train. In the range of 0.1 to 100,000 kg, our agent achieves 2% better performance to a QRF controller tuned for the same mass range, and similar performance to the QRF controller tuned specifically for a given mass. The performance of the trained RL agent for the reaction wheel based ACS achieved 10 higher better reward then that of a tuned QRF controller.
机译:强化学习(RL)最近在解决困难的数值问题方面表现出承诺,并发现了对现有问题的非直观解决方案。本研究调查了一般RL代理能力为航天器姿态控制问题找到最佳控制策略的能力。提出了两种主要类型的姿态控制系统(ACS)。首先,考虑具有完全致动的通用ACS问题,但是应用扭矩上的饱和约束,代表基于推进的ACSS。其次,考虑了基于反应轮的ACS的姿态控制问题,这对控制权具有更多的限制。使用近端策略优化(PPO)R L方法进行培训,以获得姿态控制策略。为了确保鲁棒性,卫星的惯性对控制剂未知,并为每个模拟随机化。为了实现高效学习,代理商使用课程学习培训。我们将基于RL的控制器与QRF(四元速率反馈)的态度控制器进行比较,是一种良好的状态反馈控制策略。 WC研究了系统动态的不确定性的标称性能和鲁棒性。我们基于RL的姿态控制剂适应任何航天器质量,而无需重新列车。在0.1至100,000千克的范围内,我们的代理能够为QRF控制器调整相同的质量范围的QRF控制器,以及针对给定质量专门调整的QRF控制器的类似性能。培训的RL代理对于基于反应轮的ACS的训练RL代理的性能实现了10个更高的QRF控制器奖励。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号