...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A noise-detection based AdaBoost algorithm for mislabeled data
【24h】

A noise-detection based AdaBoost algorithm for mislabeled data

机译:一种基于噪声检测的AdaBoost算法,用于标签数据错误

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Noise sensitivity is known as a key related issue of AdaBoost algorithm. Previous works exhibit that AdaBoost is prone to be overfitting in dealing with the noisy data sets due to its consistent high weights assignment on hard-to-learn instances (mislabeled instances or outliers). In this paper, a new boosting approach, named noise-detection based AdaBoost (ND-AdaBoost), is exploited to combine classifiers by emphasizing on training misclassified noisy instances and correctly classified non-noisy instances. Specifically, the algorithm is designed by integrating a noise-detection based loss function into AdaBoost to adjust the weight distribution at each iteration. A k-nearest-neighbor (k-NN) and an expectation maximization (EM) based evaluation criteria are both constructed to detect noisy instances. Further, a regeneration condition is presented and analyzed to control the ensemble training error bound of the proposed algorithm which provides theoretical support. Finally, we conduct some experiments on selected binary UCI benchmark data sets and demonstrate that the proposed algorithm is more robust than standard and other types of AdaBoost for noisy data sets.
机译:噪声敏感度是AdaBoost算法的一个关键相关问题。以前的工作表明,由于AdaBoost在难以学习的实例(标记错误的实例或异常值)上始终具有较高的权重分配,因此在处理嘈杂的数据集时倾向于过度拟合。在本文中,一种新的增强方法被称为基于噪声检测的AdaBoost(ND-AdaBoost),它通过强调训练错误分类的有噪声实例和正确分类的无噪声实例来组合分类器。具体而言,通过将基于噪声检测的损失函数集成到AdaBoost中以在每次迭代中调整权重分布来设计算法。构造了一个k近邻(k-NN)和一个基于期望最大化(EM)的评估标准来检测嘈杂的实例。此外,提出并分析了再生条件,以控制所提出算法的整体训练误差范围,为理论提供了支持。最后,我们对选定的UCI二进制基准数据集进行了一些实验,并证明了该算法比标准和其他类型的AdaBoost噪声数据集更健壮。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号