首页> 外文会议>European conference on machine learning and knowledge discovery in databases >Expectation Maximization for Average Reward Decentralized POMDPs
【24h】

Expectation Maximization for Average Reward Decentralized POMDPs

机译:平均奖励分散POMDP的期望最大化

获取原文

摘要

Planning for multiple agents under uncertainty is often based on decentralized partially observable Markov decision processes (Dec-POMDPs), but current methods must de-emphasize long-term effects of actions by a discount factor. In tasks like wireless networking, agents are evaluated by average performance over time, both short and long-term effects of actions are crucial, and discounting based solutions can perform poorly. We show that under a common set of conditions expectation maximization (EM) for average reward Dec-POMDPs is stuck in a local optimum. We introduce a new average reward EM method; it outperforms a state of the art discounted-reward Dec-POMDP method in experiments.
机译:在不确定情况下计划多个代理通常基于分散的,部分可观察的马尔可夫决策过程(Dec-POMDPs),但是当前方法必须通过折扣因子来强调行动的长期影响。在诸如无线网络之类的任务中,将根据一段时间内的平均性能来评估代理,操作的短期和长期效果都至关重要,并且基于折扣的解决方案的性能可能很差。我们证明,在一组常见条件下,平均奖励的期望最大化(EM)停留在局部最优中。我们引入了一种新的平均奖励EM方法;在实验中,它的表现优于最先进的折价Dec-POMDP方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号