首页> 外文会议>International Symposium on Neural Networks >Multiagent Reinforcement Learning Algorithm Using Temporal Difference Error
【24h】

Multiagent Reinforcement Learning Algorithm Using Temporal Difference Error

机译:使用时间差错误差的多元强化学习算法

获取原文

摘要

When agent chooses some action and does state transition in present state in reinforcement learning, it is important subject to decide how will reward for conduct that agent chooses. In this paper, by new meta heuristic method to solve hard combinatorial optimization problems, we introduce Ant-Q learning method that has been proposed to solve Traveling Salesman Problem (TSP) to approach that is based for population that use positive feedback as well as greedy search, and suggest ant reinforcement learning model using TD-error(ARLM-TDE). We could know through an experiment that proposed reinforcement learning method converges faster to optimal solution than original ACS and Ant-Q.
机译:当代理选择某些行动并在强化学习中的当前状态下的国家过渡时,重要的是决定代理选择的行为的奖励如何奖励。在本文中,通过新的元启发式方法解决了硬组合优化问题,我们介绍了Ant-Q学习方法,已提出解决旅行推销员问题(TSP),以实现使用积极反馈的人口以及贪婪使用TD-ERROR(ARLM-TDE)进行搜索,并建议蚂蚁加强学习模型。我们可以通过实验来了解,提出的强化学习方法会收敛到最佳解决方案而不是原始ACS和ANT-Q.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号