Stochastic Online Greedy Learning with Semi-bandit Feedbacks

机译：具有半强反馈的随机在线贪婪学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The greedy algorithm is extensively studied in the field of combinatorial optimization for decades. In this paper, we address the online learning problem when the input to the greedy algorithm is stochastic with unknown parameters that have to be learned over time. We first propose the greedy regret and ε-quasi greedy regret as learning metrics comparing with the performance of offline greedy algorithm. We then propose two online greedy learning algorithms with semi-bandit feedbacks, which use multi-armed bandit and pure exploration bandit policies at each level of greedy learning, one for each of the regret metrics respectively. Both algorithms achieve O(log T) problem-dependent regret bound (T being the time horizon) for a general class of combinatorial structures and reward functions that allow greedy solutions. We further show that the bound is tight in T and other problem instance parameters.

机译：贪婪算法在组合优化领域中进行了数十年的广泛研究。在本文中，当贪婪算法的输入是随机的且需要随时间学习的未知参数时，我们将解决在线学习问题。与离线贪婪算法的性能相比，我们首先提出贪婪后悔和ε-准贪婪后悔作为学习指标。然后，我们提出了两种具有半强反馈的在线贪婪学习算法，它们在贪婪学习的每个级别上都使用了多臂强盗和纯探索强盗策略，分别针对每个后悔指标使用一种策略。对于通用的组合结构和允许贪婪解的奖励函数，这两种算法都实现了与O（log T）问题相关的后悔界限（T是时间范围）。我们进一步证明，在T和其他问题实例参数中，边界是紧密的。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2015年|352-360|共9页
会议地点
作者
Tian Lin; Jian Li; Wei Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Dimensionality reduction and greedy learning of convoluted stochastic dynamics [J] . Capobianco E Nonlinear analysis. Real world applications . 2008,第5期

机译：卷积随机动力学的降维和贪婪学习
2. Supporting online collaborative learning in small groups: Teacher feedback on learning content, academic task and social participation [J] . Cesar Coll, Maria Jose Rochera, Ines de Gispert Computers & education . 2014,第juna期

机译：在小组中支持在线协作学习：教师对学习内容，学术任务和社会参与的反馈
3. An effective online teaching method: the combination of collaborative learning with initiation and self-regulation learning with feedback [J] . Chia-Wen Tsai Behaviour & Information Technology . 2013,第7a9期

机译：一种有效的在线教学方法：协作学习与启蒙相结合，自我调节学习与反馈相结合
4. Stochastic Online Greedy Learning with Semi-bandit Feedbacks [C] . Tian Lin, Jian Li, Wei Chen Annual conference on Neural Information Processing Systems . 2015

机译：随机在线贪婪学习与半强盗反馈
5. Direct and indirect effects of feedback, feedback orientation, and goal orientations on students' academic performance in online learning. [D] . Abaci, Serdar. 2014

机译：反馈，反馈方向和目标方向对学生在线学习的学习成绩的直接和间接影响。
6. Effects of Dopamine Medication on Sequence Learning with Stochastic Feedback in Parkinsons Disease [O] . Moonsang Seo, Mazda Beigi, Marjan Jahanshahi, 2010

机译：多巴胺药物对帕金森病随机反馈序列学习的影响
7. Learning to Act Greedily: Polymatroid Semi-Bandits [O] . Kveton, Branislav, Wen, Zheng, Ashkan, Azin, 2014

机译：学会贪婪地行动：polymatroid semi-Bandits

Stochastic Online Greedy Learning with Semi-bandit Feedbacks

摘要

著录项

相似文献

相关主题

期刊订阅