首页> 外文会议>Machine learning(ML95) >Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices
【24h】

Active Exploration and Learning in Real-Valued Spaces using Multi-Armed Bandit Allocation Indices

机译:使用多武装强盗分配指标在实值空间中进行主动探索和学习

获取原文
获取原文并翻译 | 示例

摘要

A method for active learning is introduced which uses Gittins multi-armed bandit allocation indices to select actions which optimally trade-off exploration and exploitation to maximize expected payoff. We apply the Gittins method to continuous action spaces by using the C4.5 algorithm to learn a mapping from state (or perception of state) and action to the success or failure of the action when taken in the state. The leaves of the resulting tree form a finite set of alternatives over the continuous space of actions. The action selected is from that leaf which, of the leaves consistent with the perceived state, has the highest Gittins index. We illustrate the technique with a simulated robot learning task for grasping objects where each grasping trial can be lengthy and it is desirable to reduce unnecessary experiments. For the grasping simulation, the Gittins index approach demonstrates statistically significant performance improvement over the Interval Estimation action selection heuristic, with little increase in computational cost. The method also has the advantage of providing a principled way of choosing the exploration parameter based on the expected number of repetitions of the task.
机译:介绍了一种主动学习的方法,该方法使用Gittins多臂匪徒分配指标来选择最佳权衡探索和开发以最大化预期收益的行动。通过使用C4.5算法来学习从状态(或状态感知)和动作到采取状态时动作成功与否的映射,我们将Gittins方法应用于连续动作空间。结果树的叶子在动作的连续空间上形成了一组有限的选择。选择的动作是从该叶子中进行的,该叶子在与感知状态一致的叶子中具有最高的Gittins指数。我们用模拟的机器人学习任务来说明该技术,该任务用于抓取物体,每次抓取试验都可能很长,因此希望减少不必要的实验。对于抓紧模拟,Gittins索引方法证明了在统计上明显优于间隔估计动作选择启发式算法,并且计算成本几乎没有增加。该方法还具有提供基于任务的预期重复次数来选择探索参数的原则方式的优点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号