Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning

机译：汤普森抽样指导随机搜索在线进行对抗性学习

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The multi-armed bandit problem has been studied for decades. In brief, a gambler repeatedly pulls one out of N slot machine arms, randomly receiving a reward or a penalty from each pull. The aim of the gambler is to maximize the expected number of rewards received, when the probabilities of receiving rewards are unknown. Thus, the gambler must, as quickly as possible, identify the arm with the largest probability of producing rewards, compactly capturing the exploration-exploitation dilemma in reinforcement learning. In this paper we introduce a particular challenging variant of the multi-armed bandit problem, inspired by the so-called N-Door Puzzle. In this variant, the gambler is only told whether the optimal arm lies to the "left" or to the "right" of the one pulled, with the feedback being erroneous with probability 1 - p. Our novel scheme for this problem is based on a Bayesian representation of the solution space, and combines this representation with Thompson sampling to balance exploration against exploitation. Furthermore, we introduce the possibility of traitorous environments that lie about the direction of the optimal arm (adversarial learning problem). Empirical results show that our scheme deals with both traitorous and non-traitorous environments, significantly outperforming competing algorithms.

机译：多臂匪问题已经研究了数十年。简而言之，赌徒反复从N个老虎机臂中拉出一个，随机地从每次拉动中获得奖励或惩罚。赌徒的目的是在接收奖励的概率未知的情况下最大程度地获得期望的奖励数量。因此，赌徒必须尽快识别出产生奖励的最大可能性的手臂，从而紧凑地捕捉强化学习中的探索与开发难题。在本文中，我们介绍了多臂匪徒问题的一个特殊挑战性变体，它受到所谓的N门难题的启发。在该变体中，仅告知赌徒最佳手臂是被拉的那个手臂的“左”还是“右”，而反馈的错误概率为1-p。我们针对此问题的新颖方案基于解决方案空间的贝叶斯表示，并将此表示与汤普森采样相结合以平衡探索与开发之间的关系。此外，我们介绍了叛逆环境存在于最佳手臂的方向上的可能性（对抗学习问题）。实证结果表明，我们的方案同时处理叛逆和非叛逆环境，其性能明显优于竞争算法。

著录项

来源
《IFIP WG 12.5 International Conference on artificial intelligence applications and innovations》|2015年|307-317|共11页
会议地点
作者
Sondre Glimsdal; Ole-Christoffer Granmo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
N-Door Puzzle; Multi-armed Bandit Problem; Adversarial Learning; Bayesian Learning; Thompson Sampling;

机译：N门拼图;多臂强盗问题;对抗学习;贝叶斯学习;汤普森采样;

相似文献

外文文献
中文文献
专利

1. Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems [J] . Sondre Glimsdal, Ole-Christoffer Granmo Journal of machine learning research . 2019,第a期

机译：汤普森采样引导的随机搜索欺骗环境，应用于根除问题
2. Thompson Sampling for Adversarial Bit Prediction [J] . Yuval Lewi, Haim Kaplan, Yishay Mansour JMLR: Workshop and Conference Proceedings . 2020,第4期

机译：汤普森采样以进行对抗性比特预测
3. A dynamic thompson sampling hyper-heuristic framework for learning activity planning in personalized learning [J] . European Journal of Operational Research . 2020,第2期

机译：用于个性化学习中的学习活动计划的动态汤普森采样超启发式框架
4. Thompson Sampling Guided Stochastic Searching on the Line for Non-stationary Adversarial Learning [C] . Sondre Glimsdal, Ole-Christoffer Granmo IEEE International Conference on Machine Learning and Applications . 2015

机译：汤普森抽样指导的非平稳对抗性学习在线随机搜索
5. Security Through Stochasticity - Toward Adversarial Defense Using Energy-Based Models [D] . Mitchell, Jonathan Craig. 2020

机译：通过瞬极安全 - 利用基于能量的模型对抗对抗防御
6. Does Participation in Written Guided Reflective Practice Exercises Affect Readiness for Self-Directed Learning in a Sample of US Anesthesiology Residents? [O] . Amy K. Miller Juve, Jeffrey R. Kirsch 2019

机译：参加书面指导的反射练习锻炼是否会影响美国麻醉学居民样本中的自主学习准备？
7. Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning [O] . Glimsdal, Sondre, Granmo, Ole-Christoffer 2015

机译：汤普森抽样指导随机搜索在线进行对抗性学习

Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅