Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

Sondre Glimsdal; Ole-Christoffer Granmo

首页> 外文期刊>Journal of machine learning research >Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

【24h】

Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

机译：汤普森采样引导的随机搜索欺骗环境，应用于根除问题

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The multi-armed bandit problem forms the foundation for solving a wide range of online stochastic optimization problems through a simple, yet effective mechanism. One simply casts the problem as a gambler who repeatedly pulls one out of N slot machine arms, eliciting random rewards. Learning of reward probabilities is then combined with reward maximization, by carefully balancing reward exploration against reward exploitation. In this paper, we address a particularly intriguing variant of the multi-armed bandit problem, referred to as the Stochastic Point Location (SPL) problem. The gambler is here only told whether the optimal arm (point) lies to the “left” or to the “right” of the arm pulled, with the feedback being erroneous with probability $1-pi$. This formulation thus targets optimization in continuous action spaces with both informative and deceptive feedback. To tackle this class of problems, we formulate a compact and scalable Bayesian representation of the solution space that simultaneously captures both the location of the optimal arm as well as the probability of receiving correct feedback. We further introduce the accompanying Thompson Sampling guided Stochastic Point Location (TS-SPL) scheme for balancing exploration against exploitation. By learning $pi$, TS-SPL also supports deceptive environments that are lying about the direction of the optimal arm. This, in turn, allows us to address the fundamental Stochastic Root Finding (SRF) problem. Empirical results demonstrate that our scheme deals with both deceptive and informative environments, significantly outperforming competing algorithms both for SRF and SPL.

机译：多武装强盗问题通过简单但有效的机制构成解决广泛的在线随机优化问题的基础。一个简单地将问题作为一个赌徒，他们反复拉出一台插槽机武器，引出随机奖励。通过仔细平衡奖励剥削的奖励探索，将奖励概率的学习与奖励最大化相结合。在本文中，我们解决了多武装强盗问题的特别有趣的变型，称为随机点位置（SPL）问题。赌徒在这里只讲述了最佳臂（点）是否位于左侧的“左”或“右”，反馈与概率为1- PI $的反馈。因此，这种配方在连续动作空间中的优化具有信息性和欺骗性反馈。为了解决这类问题，我们制定了一个紧凑且可扩展的贝叶斯表示的解决方案空间，同时捕获最佳臂的位置以及接收正确反馈的可能性。我们进一步介绍了伴随汤普森采样引导随机点定位（TS-SPL）方案，以平衡勘探勘探。通过学习$ PI $，TS-SPL还支持欺骗性环境，这些环境涉及最佳臂的方向。反过来，这使我们能够解决基本的随机根发现（SRF）问题。经验结果表明，我们的计划涉及欺骗性和信息性环境，显着优于SRF和SPL的竞争算法。

著录项

来源
《Journal of machine learning research》 |2019年第a期|共24页
作者
Sondre Glimsdal; Ole-Christoffer Granmo;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Symmetrical Hierarchical Stochastic Searching on the Line in Informative and Deceptive Environments [J] . Junqi Zhang, Yuheng Wang, Cheng Wang, Cybernetics, IEEE Transactions on . 2017,第3期

机译：信息性和欺骗性环境中的在线对称分层随机搜索
2. Vibration-guided mate searching in treehoppers: directional accuracy and sampling strategies in a complex sensory environment [J] . Gibson Jeremy S., Cocroft Reginald B. The Journal of Experimental Biology . 2018,第6期

机译：在TreeHoppers中搜索的振动引导伴侣：在复杂的感觉环境中的定向精度和采样策略
3. Thompson Sampling-Based Channel Selection Through Density Estimation Aided by Stochastic Geometry [J] . Deng Wangdong, Kamiya Shotaro, Yamamoto Koji, Quality Control, Transactions . 2020,第期

机译：基于汤普森采样的频道选择通过随机几何辅助密度估计来选择
4. Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning [C] . Sondre Glimsdal, Ole-Christoffer Granmo IFIP WG 12.5 International Conference on artificial intelligence applications and innovations . 2015

机译：汤普森抽样指导随机搜索在线进行对抗性学习
5. Uncertainty Quantification in Nuclear Reactor Modeling Using Stochastic Sampling with the Virtual Environment for Reactor Applications [D] . ?Sedota, Christopher Stephen 2020

机译：用随机取样与Virtual Enformation核反应堆建模的不确定度定量，对电抗器应用的虚拟环境
6. Improved Horvitz-Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology [O] . Norman E. Breslow, Thomas Lumley, Christie M Ballantyne, -1

机译：从两相分层样品改进了Horvitz-Thompson估计模型参数：流行病学的应用
7. Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning [O] . Glimsdal, Sondre, Granmo, Ole-Christoffer 2015

机译：汤普森抽样指导随机搜索在线进行对抗性学习

Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

摘要

著录项

相似文献

相关主题

期刊订阅