首页> 外文会议>Annual Conference on Neural Information Processing Systems >How to Combine Expert (or Novice) Advice when Actions Impact the Environment
【24h】

How to Combine Expert (or Novice) Advice when Actions Impact the Environment

机译:如何在行动影响环境时结合专家(或新手)建议

获取原文

摘要

The so-called "experts algorithms" constitute a methodology for choosing actions repeatedly, when the rewards depend both on the choice of action and on the unknown current state of the environment. An experts algorithm has access to a set of strategies ("experts"), each of which may recommend which action to choose. The algorithm learns how to combine the recommendations of individual experts so that, in the long run, for any fixed sequence of states of the environment, it does as well as the best expert would have done relative to the same sequence. This methodology may not be suitable for situations where the evolution of states of the environment depends on past chosen actions, as is usually the case, for example, in a repeated non-zero-sum game. A new experts algorithm is presented and analyzed in the context of repeated games. It is shown that asymptotically, under certain conditions, it performs as well as the best available expert. This algorithm is quite different from previously proposed experts algorithms. It represents a shift from the paradigms of regret minimization and myopic optimization to consideration of the long-term effect of a player's actions on the opponent's actions or the environment. The importance of this shift is demonstrated by the fact that this algorithm is capable of inducing cooperation in the repeated Prisoner's Dilemma game, whereas previous experts algorithms converge to the suboptimal non-cooperative play.
机译:所谓的“专家算法”构成一种用于重复选择动作的方法,当奖励取决于作用的选择和环境未知的当前状态时。专家算法可以访问一组策略(“专家”),每个策略可能会推荐选择哪些操作。该算法了解如何将个别专家的建议组合起来,从长远来看,对于环境的任何固定状态,它可以和最好的专家相对于相同的顺序进行。该方法可能不适合于环境的状态的演变取决于过去所选择的动作的情况,例如通常在重复的非零和游戏中的情况。在重复游戏的背景下呈现和分析了一个新的专家算法。结果表明,在某些条件下渐近地,它表现出和最佳可用专家。该算法与先前提出的专家算法非常不同。它代表了后悔最小化和近视优化的范式的转变,以考虑玩家对对手行动或环境的长期影响。该算法能够在重复囚犯的困境游戏中诱导合作的事实证明了这种转变的重要性,而先前的专家算法会聚到次优不合作播放。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号