Expectation Maximization for Average Reward Decentralized POMDPs

机译：平均奖励分散POMDP的期望最大化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Planning for multiple agents under uncertainty is often based on decentralized partially observable Markov decision processes (Dec-POMDPs), but current methods must de-emphasize long-term effects of actions by a discount factor. In tasks like wireless networking, agents are evaluated by average performance over time, both short and long-term effects of actions are crucial, and discounting based solutions can perform poorly. We show that under a common set of conditions expectation maximization (EM) for average reward Dec-POMDPs is stuck in a local optimum. We introduce a new average reward EM method; it outperforms a state of the art discounted-reward Dec-POMDP method in experiments.

机译：在不确定情况下计划多个代理通常基于分散的，部分可观察的马尔可夫决策过程（Dec-POMDPs），但是当前方法必须通过折扣因子来强调行动的长期影响。在诸如无线网络之类的任务中，将根据一段时间内的平均性能来评估代理，操作的短期和长期效果都至关重要，并且基于折扣的解决方案的性能可能很差。我们证明，在一组常见条件下，平均奖励的期望最大化（EM）停留在局部最优中。我们引入了一种新的平均奖励EM方法;在实验中，它的表现优于最先进的折价Dec-POMDP方法。

著录项

来源
《European conference on machine learning and knowledge discovery in databases》|2013年|129-144|共16页
会议地点
作者
Joni Pajarinen; Jaakko Peltonen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Dec-POMDP; average reward; expectation maximization; planning under uncertainty;

机译：Dec-POMDP;平均奖励;期望最大化不确定性下的计划;

相似文献

外文文献
中文文献
专利

1. Centralized Optimization for Dec-POMDPs Under the Expected Average Reward Criterion [J] . Xiaofeng Jiang, Xiaodong Wang, Hongsheng Xi, Automatic Control, IEEE Transactions on . 2017,第11期

机译：预期平均奖励标准下的Dec-POMDP集中优化
2. Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion [J] . Xiaofeng Jiang, Hongsheng Xi, Xiaodong Wang, IEEE Transactions on Automatic Control . 2016,第10期

机译：在预期平均奖励标准下寻找约束POMDP的最佳基于观察的策略
3. Finding optimal memoryless policies of POMDPs under the expected average reward criterion [J] . Li Y., Yin B., Xi H. European Journal of Operational Research . 2011,第3期

机译：在预期平均奖励标准下找到POMDP的最佳无记忆策略
4. Expectation Maximization for Average Reward Decentralized POMDPs [C] . Joni Pajarinen, Jaakko Peltonen European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases . 2013

机译：预期普通奖励的最大化可分散的POMDPS
5. Constrained expectation-maximization (EM), dynamic analysis, linear quadratic tracking, and nonlinear constrained expectation-maximization (EM) for the analysis of genetic regulatory networks and signal transduction networks. [D] . Xiong, Hao. 2008

机译：约束期望最大化（EM），动态分析，线性二次跟踪和非线性约束期望最大化（EM），用于分析遗传调控网络和信号转导网络。
6. Modeling and Planning with Macro-Actions in Decentralized POMDPs [O] . Christopher Amato, George Konidaris, Leslie P. Kaelbling, -1

机译：在分散的POMDP中使用宏动作进行建模和计划
7. Expectation Maximization for Average Reward Decentralized POMDPs [O] . Joni Pajarinen, Jaakko Peltonen 2015

机译：平均奖励下放pOmDp的期望最大化

Expectation Maximization for Average Reward Decentralized POMDPs

摘要

著录项

相似文献

相关主题

期刊订阅