首页> 中国专利> 用状态相似性进行经验回放采样的无人平台决策学习方法

用状态相似性进行经验回放采样的无人平台决策学习方法

页面导航

摘要
著录项
法律信息
相似文献

摘要

本发明公开了一种用状态相似性进行经验回放采样的无人平台决策学习方法，输入为从经验回放池中采样得到的历史样本数据，之后计算每个历史样本数据与无人平台的决策神经网络模型的当前策略的状态相似度与动作相似度，基于状态和动作相似度来判定赋予历史样本数据不同的训练权重，根据历史样本数据的训练权重不同更新无人平台的决策神经网络模型。通过限制使用与当前策略差异较大的数据的更新幅度，来缓解利用深度强化学习更新无人平台策略时经验回放池中数据分布与当前策略对应数据分布不一致的问题，能够更好的利用经验回放池中历史数据，提高训练数据利用率和稳定性，使得无人平台能够学得更好更稳定的策略。

著录项

公开/公告号CN112734030B

专利类型发明专利
公开/公告日2022.09.02

原文格式PDF
申请/专利权人中国科学技术大学;
展开▼

申请/专利号CN202011623599.6
发明设计人庄连生;张淦霖;李厚强;
展开▼

申请日2020.12.31
分类号G06N3/08;G06N5/00;
代理机构北京凯特来知识产权代理有限公司;
代理人郑立明;付久春
地址 230026 安徽省合肥市包河区金寨路96号
入库时间 2022-09-26 23:17:37

法律信息

法律状态公告日

法律状态信息

法律状态
2022-09-02

授权

发明专利权授予

相似文献

专利
中文文献
外文文献

1. 用状态相似性进行经验回放采样的无人平台决策学习方法 [P] . 中国专利： CN112734030A . 2021-04-30
2. 基于置信上界思想的经验回放采样强化学习方法及系统 [P] . 中国专利： CN112734014A . 2021-04-30
3. Method and system for increasing the degree of autonomy of an unmanned aircraft by utilizing meteorological data received from GPS dropsondes released from an unmanned aircraft to determine course and altitude corrections and an automated data management and decision support navigational system to make these navigational calculations and to correct the unmanned aircraft's flight path [P] . 美国专利： US2009326792A1 . 2009-12-31

机译：通过利用从无人飞机释放的从GPS探空仪接收的气象数据来确定航向和高度校正以及自动数据管理和决策支持导航系统来进行这些导航计算和校正的方法和系统，以提高无人飞机的自治程度无人机的飞行路线
4. Neural network on-device continuous learning method and apparatus for analyzing input data by optimized sampling of training image for smartphone, drone, ship or military purpose, and test method and apparatus using the same [P] . 日本专利： JP2020123337A . 2020-08-13

机译：通过优化用于智能手机，无人机，舰船或军事目的的训练图像采样来分析输入数据的神经网络设备上持续学习方法和装置，以及使用该方法的测试方法和装置
5. Learning method and learning device capable of detecting lane by utilizing lane model, and test method and test equipment using this model Leading device for decision-making with use of leather model and test medium [P] . 日本专利： JP6980289B2 . 2021-12-15

机译：通过利用车道模型检测车道的学习方法和学习设备，使用这种型号领先的设备进行测试方法和测试设备，用于使用皮革模型和测试介质的决策