【24h】

Predicting Rare Classes: Can Boosting Make Any Weak Learner Strong?

机译:预测稀有课程:提高能力可以使任何弱小的学习者变得强大吗?

获取原文

摘要

Boosting is a strong ensemble-based learning algorithm with the promise of iteratively improving the classification accuracy using any base learner, as long as it satisfies the condition of yielding weighted accuracy > 0.5. In this paper, we analyze boosting with respect to this basic condition on the base learner, to see if boosting ensures prediction of rarely occurring events with high recall and precision. First we show that a base learner can satisfy the required condition even for poor recall or precision levels, especially for very rare classes. Furthermore, we show that the intelligent weight updating mechanism in boosting, even in its strong cost-sensitive form, does not prevent cases where the base learner always achieves high precision but poor recall or high recall but poor precision, when mapped to the original distribution. In either of these cases, we show that the voting mechanism of boosting fails to achieve good overall recall and precision for the ensemble. In effect, our analysis indicates that one cannot be blind to the base learner performance, and just rely on the boosting mechanism to take care of its weakness. We validate our arguments empirically on variety of real and synthetic rare class problems. In particular, using AdaCost as the boosting algorithm, and variations of PNrule and RIPPER as the base learners, we show that if algorithm A achieves better recall-precision balance than algorithm B, then using A as the base learner in AdaCost yields significantly better performance than using B as the base learner.
机译:Boosting是一个强大的基于集合的学习算法,它有望使用任何基础学习器迭代地提高分类精度,只要它满足产生加权精度> 0.5的条件即可。在本文中,我们针对基础学习者的这一基本条件分析了增强,以了解增强是否能够以较高的查全率和准确性来确保很少发生的事件的预测。首先,我们证明基础学习者即使在召回率或精确度不佳的情况下也能满足要求的条件,尤其是对于非常少见的班级。此外,我们表明,即使以强大的成本敏感形式进行增强的智能权重更新机制,也无法防止基础学习者映射到原始分布时总是达到高精度但召回率不高或召回率很高但精度不高的情况。 。在这两种情况下,我们都表明,增强投票的机制无法使整体获得良好的整体召回率和准确性。实际上,我们的分析表明,不能对基础学习者的表现视而不见,而只能依靠增强机制来弥补其劣势。我们通过实证和综合稀有类问题的各种经验性地验证了我们的论点。尤其是,使用AdaCost作为增强算法,并使用PNrule和RIPPER的变体作为基本学习器,我们证明,如果算法A获得的重调用精度平衡比算法B更好,那么在AdaCost中使用A作为基本学习器将产生显着更好的性能。而不是使用B作为基础学习者。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号