首页> 外文会议>International joint conference on artificial intelligence >Learning a Value Analysis Tool For Agent Evaluation
【24h】

Learning a Value Analysis Tool For Agent Evaluation

机译:学习代理评估的值分析工具

获取原文

摘要

Evaluating an agent's performance in a stochastic setting is necessary for agent development, scientific evaluation, and competitions. Traditionally, evaluation is done using Monte Carlo estimation; the magnitude of the stochasticity in the domain or the high cost of sampling, however, can often prevent the approach from resulting in statistically significant conclusions. Recently, an advantage sum technique has been proposed for constructing unbiased, low variance estimates of agent performance. The technique requires an expert to define a value function over states of the system, essentially a guess of the state's unknown value. In this work, we propose learning this value function from past interactions between agents in some target population. Our learned value functions have two key advantages: they can be applied in domains where no expert value function is available and they can result in tuned evaluation for a specific population of agents (e.g., novice versus advanced agents). We demonstrate these two advantages in the domain of poker. We show that we can reduce variance over state-of-the-art estimators for a specific population of limit poker players as well as construct the first variance reducing estimators for no-limit poker and multi-player limit poker.
机译:在随机设置中评估代理商的性能是代理商发展,科学评估和竞争所必需的。传统上,使用蒙特卡罗估计进行评估;然而,域中的随机性的大小或采样的高成本通常可以防止这种方法导致统计学上的结论。最近,已经提出了一种用于构建代理性能的非偏见,低方差估计的优势和技术。该技术需要专家来定义系统的状态,基本上猜测该状态的未知值。在这项工作中,我们建议从某些目标人口之间的代理人之间的过去的相互作用来学习这个价值函数。我们的学习价值函数有两个关键优势:它们可以应用于没有专家价值功能的域中,他们可以导致特定的代理人口调整评估(例如,新手与高级代理商)。我们展示了扑克领域的这两个优势。我们表明,我们可以减少对最先进的扑克玩家的最先进估计的方差,以及构建一个无限扑克和多人限制扑克的第一方差减少估计器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号