Sharing Information in Adversarial Bandit

机译：在敌对强盗中共享信息

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

2-Player games in general provide a popular platform for research in Artificial Intelligence (AI). One of the main challenges coming from this platform is approximating a Nash Equilibrium (NE) over zero-sum matrix games. While the problem of computing such a Nash Equilibrium is solvable in polynomial time using Linear Programming (LP), it rapidly becomes infeasible to solve as the size of the matrix grows; a situation commonly encountered in games. This paper focuses on improving the approximation of a NE for matrix games such that it outperforms the state-of-the-art algorithms given a finite (and rather small) number T of oracle requests to rewards. To reach this objective, we propose to share information between the different relevant pure strategies. We show both theoretically by improving the bound and empirically by experiments on artificial matrices and on a real-world game that information sharing leads to an improvement of the approximation of the NE.

机译：通常，两人游戏为人工智能（AI）研究提供了一个受欢迎的平台。该平台面临的主要挑战之一是在零和矩阵游戏中逼近Nash均衡（NE）。尽管使用线性规划（LP）可以在多项式时间内解决这种Nash平衡问题，但是随着矩阵大小的增长，解决该问题很快变得不可行。游戏中经常遇到的情况。本文着重于改进矩阵游戏中NE的近似值，使其在给定有限（但很小）的Oracle奖励请求T的情况下，胜过最新的算法。为了实现这一目标，我们建议在不同的相关纯策略之间共享信息。我们在理论上通过改进边界来进行展示，在经验上通过在人工矩阵上进行的实验以及在现实世界中的游戏中进行展示，即信息共享都可以提高NE的逼近度。

著录项

来源
《European conference on applications of evolutionary computation》|2014年|386-398|共13页
会议地点
作者
David L. St-Pierre; Olivier Teytaud;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Bandit problem; Monte-Carlo; Nash Equilibrium; Games;

机译：强盗问题;蒙特卡洛;纳什均衡;游戏类;

相似文献

外文文献
中文文献
专利

1. Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits [J] . Julian Zimmert, Yevgeny Seldin Journal of machine learning research . 2021,第a期

机译：Tsallis-inf：随机和对抗性匪徒的最佳算法
2. Adaptive OFDM underwater acoustic transmission: An adversarial bandit approach [J] . Neurocomputing . 2020,第Apra14期

机译：自适应OFDM水下声传输：对抗性强盗方法
3. Multi-Player Bandits: The Adversarial Case [J] . Pragnya Alatur, Kfir Y. Levy, Andreas Krause Journal of machine learning research . 2020,第a期

机译：多人匪徒：对抗案
4. A Shared Task on Bandit Learning for Machine Translation [C] . Artem Sokolov, Julia Kreutzer, Kellen Sunderland, Second conference on machine translation . 2017

机译：机器翻译的强盗学习的共同任务
5. Adaptive Preference Learning with Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration [D] . Chen, Bangrui. 2017

机译：带有土匪反馈的自适应偏好学习：信息过滤，决斗土匪和激励探索
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. Sharing Information in Adversarial Bandit [O] . David L. St-Pierre, Olivier Teytaud 2014

机译：在对抗匪盗中分享信息

Sharing Information in Adversarial Bandit

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅