首页> 美国政府科技报告 >Learning to Cooperate in a Search Mission via Policy Search
【24h】

Learning to Cooperate in a Search Mission via Policy Search

机译:通过政策搜索学习合作搜索任务

获取原文

摘要

The dangers of and the time needed when clearing an area from unexploded ordnance can be reduced by a system consisting of unmanned, autonomous robots. The system will need less time when more than one robot cooperates to search the area. The reinforcement learning algorithm GPOMDP is evaluated for the specific case of finding a decision rule that, given a map and the robot's position on the map, enables the robot to automatically choose between different possible actions. The actions lead to a near optimal path through an area where some parts need to be searched. A neural network is used as a function approximator to store and improve the decision rule, and also to find actions according to it. The problem expanded to include two robots using the same decision rule, distributed in a sense that the robots pick actions according to their own perception of the surroundings and independent of the other robot's action. To achieve cooperation between the robots, they are trained to maximize a shared reward that is equal to the sum of individual rewards that are given according to the consequences of the robots' actions. When using the learnt policy to search the largest of the experiment's areas, two robots that have been trained with a shared reward use 70% of the time that one optimal robot would need, while two agents that have been trained with their individual rewards need 88%.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号