首页> 外文会议>Annual conference on Neural Information Processing Systems >Stochastic Online Greedy Learning with Semi-bandit Feedbacks
【24h】

Stochastic Online Greedy Learning with Semi-bandit Feedbacks

机译:具有半强反馈的随机在线贪婪学习

获取原文

摘要

The greedy algorithm is extensively studied in the field of combinatorial optimization for decades. In this paper, we address the online learning problem when the input to the greedy algorithm is stochastic with unknown parameters that have to be learned over time. We first propose the greedy regret and ε-quasi greedy regret as learning metrics comparing with the performance of offline greedy algorithm. We then propose two online greedy learning algorithms with semi-bandit feedbacks, which use multi-armed bandit and pure exploration bandit policies at each level of greedy learning, one for each of the regret metrics respectively. Both algorithms achieve O(log T) problem-dependent regret bound (T being the time horizon) for a general class of combinatorial structures and reward functions that allow greedy solutions. We further show that the bound is tight in T and other problem instance parameters.
机译:贪婪算法在组合优化领域中进行了数十年的广泛研究。在本文中,当贪婪算法的输入是随机的且需要随时间学习的未知参数时,我们将解决在线学习问题。与离线贪婪算法的性能相比,我们首先提出贪婪后悔和ε-准贪婪后悔作为学习指标。然后,我们提出了两种具有半强反馈的在线贪婪学习算法,它们在贪婪学习的每个级别上都使用了多臂强盗和纯探索强盗策略,分别针对每个后悔指标使用一种策略。对于通用的组合结构和允许贪婪解的奖励函数,这两种算法都实现了与O(log T)问题相关的后悔界限(T是时间范围)。我们进一步证明,在T和其他问题实例参数中,边界是紧密的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号