...
首页> 外文期刊>Machine Learning >Learning noisy linear classifiers via adaptive and selective sampling
【24h】

Learning noisy linear classifiers via adaptive and selective sampling

机译:通过自适应和选择性采样学习噪声线性分类器

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We introduce efficient margin-based algorithms for selective sampling and filtering in binary classification tasks. Experiments on real-world textual data reveal that our algorithms perform significantly better than popular and similarly efficient competitors. Using the so-called Mammen-Tsybakov low noise condition to parametrize the instance distribution, and assuming linear label noise, we show bounds on the convergence rate to the Bayes risk of a weaker adaptive variant of our selective sampler. Our analysis reveals that, excluding logarithmic factors, the average risk of this adaptive sampler converges to the Bayes risk at rate N-(1+α)(2+α)/2(3+α) where N denotes the number of queried labels, and α > 0 is the exponent in the low noise condition. For all α > √3 - 1 ≈ 0.73 this convergence rate is asymptotically faster than the rate N-(1+α)/(2+α) achieved by the fully supervised version of the base selective sampler, which queries all labels. Moreover, for α →∞ (hard margin condition) the gap between the semi- and fully-supervised rates becomes exponential.
机译:我们介绍了有效的基于余量的算法,用于二进制分类任务中的选择性采样和过滤。对真实世界文本数据的实验表明,我们的算法的性能明显优于受欢迎且效率类似的竞争对手。使用所谓的Mammen-Tsybakov低噪声条件对实例分布进行参数化,并假设线性标签噪声,我们证明了选择性采样器适应性较弱的贝叶斯风险的收敛速度范围。我们的分析表明,除对数因素外,此自适应采样器的平均风险以N-(1 +α)(2 +α)/ 2(3 +α)的速率收敛到贝叶斯风险,其中N表示查询的标签数,α> 0是低噪声条件下的指数。对于所有α>√3-1-≈0.73,该收敛速度渐近地快于基本选择性采样器的全监督版本(查询所有标签)获得的速率N-(1 +α)/(2 +α)。此外,对于α→∞(硬边界条件),半监督率和完全监督率之间的差距变为指数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号