首页> 外文期刊>ACM Transactions on Interactive Intelligent Systems >Efficient Interactive Multiclass Learning from Binary Feedback
【24h】

Efficient Interactive Multiclass Learning from Binary Feedback

机译:从二进制反馈中进行高效的交互式多类学习

获取原文
获取原文并翻译 | 示例
       

摘要

We introduce a novel algorithm called upper confidence-weighted learning (UCWL) for online multiclass learning from binary feedback (e.g., feedback that indicates whether the prediction was right or wrong). UCWL combines the upper confidence bound (UCB) framework with the soft confidence-weighted (SCW) online learning scheme. In UCB, each instance is classified using both score and uncertainty. For a given instance in the sequence, the algorithm might guess its class label primarily to reduce the class uncertainty. This is a form of informed exploration, which enables the performance to improve with lower sample complexity compared to the case without exploration. Combining UCB with SCW leads to the ability to deal well with noisy and nonseparable data, and state-of-the-art performance is achieved without increasing the computational cost. A potential application setting is human-robot interaction (HRI), where the robot is learning to classify some set of inputs while the human teaches it by providing only binary feedback-or sometimes even the wrong answer entirely. Experimental results in the HRI setting and with two benchmark datasets from other settings show that UCWL outperforms other state-of-the-art algorithms in the online binary feedback setting-and surprisingly even sometimes outperforms state-of-the-art algorithms that get full feedback (e.g., the true class label), whereas UCWL gets only binary feedback on the same data sequence.
机译:我们针对二进制反馈(例如,指示预测是对还是错的反馈)引入了一种称为上置信度加权学习(UCWL)的新算法,用于在线多类学习。 UCWL将上置信度上限(UCB)框架与软置信度加权(SCW)在线学习方案结合在一起。在UCB中,使用得分和不确定性对每个实例进行分类。对于序列中的给定实例,算法可能主要猜测其类别标签以减少类别不确定性。这是一种明智的探索形式,与不进行探索的情况相比,它可以在降低样本复杂度的情况下提高性能。将UCB与SCW结合使用可以很好地处理嘈杂的数据和不可分离的数据,并且可以在不增加计算成本的情况下实现最新性能。潜在的应用程序设置是人机交互(HRI),其中机器人正在学习对输入的某些集合进行分类,而人通过仅提供二进制反馈(有时甚至是完全错误的答案)来教它。在HRI设置中以及来自其他设置的两个基准数据集的实验结果表明,UCWL优于在线二进制反馈设置中的其他最新算法,而且令人惊讶的是,有时甚至胜过最新的完整算法反馈(例如,真实的类标签),而UCWL仅在同一数据序列上获得二进制反馈。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号