首页> 外文会议>Multiagent System Technologies >Optimistic-Pessimistic Q-Learning Algorithm for Multi-Agent Systems
【24h】

Optimistic-Pessimistic Q-Learning Algorithm for Multi-Agent Systems

机译:多智能体系统的乐观乐观Q学习算法

获取原文
获取原文并翻译 | 示例

摘要

A reinforcement learning algorithm OP-Q for multi-agent systems based on Hurwicz's optimistic-pessimistic criterion which allows to embed preliminary knowledge on the degree of environment friendliness is proposed. The proof of its convergence to stationary policy is given. Thorough testing of the developed algorithm against well-known reinforcement learning algorithms has shown that OP-Q can function on the level of its opponents.
机译:提出了一种基于Hurwicz的乐观悲观准则的多智能体强化学习算法OP-Q,该准则允许嵌入关于环境友好度的初步知识。给出了其收敛于平稳策略的证明。针对著名的强化学习算法对开发的算法进行了全面测试,结果表明OP-Q可以在其对手的水平上发挥作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号