首页> 外文会议>IFIP WG 12.5 International Conference on artificial intelligence applications and innovations >Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning
【24h】

Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning

机译:汤普森抽样指导随机搜索在线进行对抗性学习

获取原文
获取外文期刊封面目录资料

摘要

The multi-armed bandit problem has been studied for decades. In brief, a gambler repeatedly pulls one out of N slot machine arms, randomly receiving a reward or a penalty from each pull. The aim of the gambler is to maximize the expected number of rewards received, when the probabilities of receiving rewards are unknown. Thus, the gambler must, as quickly as possible, identify the arm with the largest probability of producing rewards, compactly capturing the exploration-exploitation dilemma in reinforcement learning. In this paper we introduce a particular challenging variant of the multi-armed bandit problem, inspired by the so-called N-Door Puzzle. In this variant, the gambler is only told whether the optimal arm lies to the "left" or to the "right" of the one pulled, with the feedback being erroneous with probability 1 - p. Our novel scheme for this problem is based on a Bayesian representation of the solution space, and combines this representation with Thompson sampling to balance exploration against exploitation. Furthermore, we introduce the possibility of traitorous environments that lie about the direction of the optimal arm (adversarial learning problem). Empirical results show that our scheme deals with both traitorous and non-traitorous environments, significantly outperforming competing algorithms.
机译:多臂匪问题已经研究了数十年。简而言之,赌徒反复从N个老虎机臂中拉出一个,随机地从每次拉动中获得奖励或惩罚。赌徒的目的是在接收奖励的概率未知的情况下最大程度地获得期望的奖励数量。因此,赌徒必须尽快识别出产生奖励的最大可能性的手臂,从而紧凑地捕捉强化学习中的探索与开发难题。在本文中,我们介绍了多臂匪徒问题的一个特殊挑战性变体,它受到所谓的N门难题的启发。在该变体中,仅告知赌徒最佳手臂是被拉的那个手臂的“左”还是“右”,而反馈的错误概率为1-p。我们针对此问题的新颖方案基于解决方案空间的贝叶斯表示,并将此表示与汤普森采样相结合以平衡探索与开发之间的关系。此外,我们介绍了叛逆环境存在于最佳手臂的方向上的可能性(对抗学习问题)。实证结果表明,我们的方案同时处理叛逆和非叛逆环境,其性能明显优于竞争算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号