首页> 外文会议>Machine learning(ML95) >Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices

【24h】

Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices

机译：使用多武装强盗分配指标在实值空间中进行主动探索和学习

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

A method for active learning is introduced which uses Gittins multi-armed bandit allocation indices to select actions which optimally trade-off exploration and exploitation to maximize expected payoff. We apply the Gittins method to continuous action spaces by using the C4.5 algorithm to learn a mapping from state (or perception of state) and action to the success or failure of the action when taken in the state. The leaves of the resulting tree form a finite set of alternatives over the continuous space of actions. The action selected is from that leaf which, of the leaves consistent with the perceived state, has the highest Gittins index. We illustrate the technique with a simulated robot learning task for grasping objects where each grasping trial can be lengthy and it is desirable to reduce unnecessary experiments. For the grasping simulation, the Gittins index approach demonstrates statistically significant performance improvement over the Interval Estimation action selection heuristic, with little increase in computational cost. The method also has the advantage of providing a principled way of choosing the exploration parameter based on the expected number of repetitions of the task.

机译：介绍了一种主动学习的方法，该方法使用Gittins多臂匪徒分配指标来选择最佳权衡探索和开发以最大化预期收益的行动。通过使用C4.5算法来学习从状态（或状态感知）和动作到采取状态时动作成功与否的映射，我们将Gittins方法应用于连续动作空间。结果树的叶子在动作的连续空间上形成了一组有限的选择。选择的动作是从该叶子中进行的，该叶子在与感知状态一致的叶子中具有最高的Gittins指数。我们用模拟的机器人学习任务来说明该技术，该任务用于抓取物体，每次抓取试验都可能很长，因此希望减少不必要的实验。对于抓紧模拟，Gittins索引方法证明了在统计上明显优于间隔估计动作选择启发式算法，并且计算成本几乎没有增加。该方法还具有提供基于任务的预期重复次数来选择探索参数的原则方式的优点。

著录项

来源
《Machine learning(ML95) 》|1995年|p.480-487|共8页
会议地点 Tahoe City CA(US);Tahoe City CA(US)
作者
Marcos Salganicoff; Lyle H. Ungar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Rethinking the Gold Standard With Multi-armed Bandits: Machine Learning Allocation Algorithms for Experiments [J] . Kaibel Chris, Biemann Torsten Organizational Research Methods . 2021 ,第1期

机译：用多武装燃烧的金标：实验的机器学习分配算法
2. Game-based hierarchical multi-armed bandit learning algorithm for joint channel and power allocation in underwater acoustic communication networks [J] . Han Song, Li Xinbin, Yan Lei, Neurocomputing . 2018 ,第MAY10期

机译：基于游戏的水下声通信网络中联合信道和功率分配的分层多臂匪学习算法
3. A Novel Active Optimization Approach for Rapid and Efficient Design Space Exploration Using Ensemble Machine Learning [J] . Opeoluwa Owoyele, Pinaki Pal Journal of Energy Resources Technology . 2021 ,第3期

机译：一种新的积极优化方法，用于快速高效的设计空间探索使用集合机学习
4. Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices [C] . Marcos Salganicoff, Lyle H. Ungar International conference on machine learning . 1995

机译：使用多武装强盗分配指标的实际空间中的积极探索与学习
5. Multi-Armed Bandits for Preference Learning [D] . Katariya, Sumeet. 2018

机译：偏好学习的多武装土匪
6. Anytime Exploration for Multi-armed Bandits using ConfidenceInformation [O] . Kwang-Sung Jun, Robert Nowak -1

机译：随时随地探索多臂匪信息
7. Sleeping Multi-Armed Bandit Learning for Fast Uplink Grant Allocation in Machine Type Communications [O] . Samad Ali, Aidin Ferdowsi, Walid Saad, 2020

机译：睡眠多武器匪管学习，用于机器类型通信中的快速上行链路授权分配

Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices

摘要

著录项

相似文献

相关主题

期刊订阅