首页> 外文期刊>Knowledge-Based Systems >RFBoost: An improved multi-label boosting algorithm and its application to text categorisation
【24h】

RFBoost: An improved multi-label boosting algorithm and its application to text categorisation

机译:RFBoost:一种改进的多标签增强算法及其在文本分类中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

The AdaBoost.MH boosting algorithm is considered to be one of the most accurate algorithms for multi label classification. AdaBoost.MH works by iteratively building a committee of weak hypotheses of decision stumps. In each round of AdaBoost.MH learning, all features are examined, but only one feature is used to build a new weak hypothesis. This learning mechanism may entail a high. degree of computational time complexity, particularly in the case of a large-scale dataset. This paper describes a way to manage the learning complexity and improve the classification performance of AdaBoost.MH. We propose an improved version of AdaBoost.MH, called RFBoost. The weak learning in RFBoost is based on filtering a small fixed number of ranked features in each boosting round rather than using all features, as AdaBoost.MH does. We propose two methods for ranking the features: One Boosting Round and Labeled Latent Dirichlet Allocation (LLDA), a supervised topic model based on Gibbs sampling. Additionally, we investigate the use of LLDA as a feature selection method for reducing the feature space based on the maximal conditional probabilities of words across labels. Our experimental results on eight well-known benchmarks for multi-label text categorisation show that RFBoost is significantly more efficient and effective than the baseline algorithms. Moreover, the LLDA-based feature ranking yields the best performance for RFBoost. (C) 2016 Elsevier B.V. All rights reserved.
机译:AdaBoost.MH增强算法被认为是用于多标签分类的最准确算法之一。 AdaBoost.MH的工作是迭代地建立一个由决策树的弱假设组成的委员会。在每轮AdaBoost.MH学习中,都会检查所有功能,但仅使用一个功能来建立新的弱假设。这种学习机制可能需要很高的知识。程度的计算时间复杂度,尤其是在大规模数据集的情况下。本文介绍了一种管理学习复杂性并提高AdaBoost.MH分类性能的方法。我们提出了AdaBoost.MH的改进版本,称为RFBoost。 RFBoost中的弱学习是基于在每个提升回合中过滤少量固定数量的排名特征,而不是像AdaBoost.MH那样使用所有特征。我们提出了两种对特征进行排名的方法:一次增强回合和标记潜在狄利克雷分配(LLDA),这是一种基于Gibbs采样的监督主题模型。此外,我们研究了LLDA作为一种特征选择方法的使用,该方法基于跨标签的单词的最大条件概率来减少特征空间。我们在八种著名的多标签文本分类基准测试上的实验结果表明,RFBoost比基准算法更有效。此外,基于LLDA的功能排名为RFBoost提供了最佳性能。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号