首页> 外文会议>String Processing and Information Retrieval; Lecture Notes in Computer Science; 4209 >MP-Boost: A Multiple-Pivot Boosting Algorithm and Its Application to Text Categorization
【24h】

MP-Boost: A Multiple-Pivot Boosting Algorithm and Its Application to Text Categorization

机译:MP-Boost:一种多轴提升算法及其在文本分类中的应用

获取原文
获取原文并翻译 | 示例

摘要

AdaBoost.MH is a popular supervised learning algorithm for building multi-label (aka n-of-m) text classifiers. AdaBoost.MH belongs to the family of "boosting" algorithms, and works by iteratively building a committee of "decision stump" classifiers, where each such classifier is trained to especially concentrate on the document-class pairs that previously generated classifiers have found harder to correctly classify. Each decision stump hinges on a specific "pivot term", checking its presence or absence in the test document in order to take its classification decision. In this paper we propose an improved version of AdaBoost.MH, called MP-Boost, obtained by selecting, at each iteration of the boosting process, not one but several pivot terms, one for each category. The rationale behind this choice is that this provides highly individualized treatment for each category, since each iteration thus generates, for each category, the best possible decision stump. We present the results of experiments showing that MP-Boost is much more effective than AdaBoost.MH. In particular, the improvement in effectiveness is spectacular when few boosting iterations are performed, and (only) high for many such iterations. The improvement is especially significant in the case of macroaveraged effectiveness, which shows that MP-Boost is especially good at working with hard, infrequent categories.
机译:AdaBoost.MH是一种流行的监督学习算法,用于构建多标签(aka n-of-m)文本分类器。 AdaBoost.MH属于“增强”算法家族,通过迭代地建立“决策树桩”分类器委员会来工作,在该委员会中,每个此类分类器都经过专门训练以专注于以前生成的分类器难以发现的文档类对正确分类。每个决策树桩都取决于特定的“枢轴项”,检查其是否存在于测试文档中以做出其分类决策。在本文中,我们提出了一种AdaBoost.MH的改进版本,称为MP-Boost,它是通过在增强过程的每次迭代中选择一个而不是几个关键项来获得的,每个项中一个。此选择的基本原理是,这为每个类别提供了高度个性化的处理,因为每次迭代都会为每个类别生成最佳的决策树桩。我们提供的实验结果表明,MP-Boost比AdaBoost.MH更有效。特别地,当执行很少的增强迭代时,有效性的提高是惊人的,并且对于许多这样的迭代而言(仅)很高。在宏观平均效率的情况下,这种改进尤其显着,这表明MP-Boost特别擅长处理困难且不常见的类别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号