首页> 外文期刊>Journal of Information Science >LDA-AdaBoost.MH: Accelerated AdaBoost.MH based on latent Dirichlet allocation for text categorization
【24h】

LDA-AdaBoost.MH: Accelerated AdaBoost.MH based on latent Dirichlet allocation for text categorization

机译:LDA-AdaBoost.MH:基于潜在Dirichlet分配进行文本分类的加速AdaBoost.MH

获取原文
获取原文并翻译 | 示例
           

摘要

AdaBoost.MH is a boosting algorithm that is considered to be one of the most accurate algorithms for multilabel classification. It works by iteratively building a committee of weak hypotheses of decision stumps. To build the weak hypotheses, in each iteration, AdaBoost.MH obtains the whole extracted features and examines them one by one to check their ability to characterize the appropriate category. Using Bag-Of-Words for text representation dramatically increases the computational time of AdaBoost.MH learning, especially for large-scale datasets. In this paper we demonstrate how to improve the efficiency and effectiveness of AdaBoost.MH using latent topics, rather than words. A well-known probabilistic topic modelling method, Latent Dirichlet Allocation, is used to estimate the latent topics in the corpus as features for AdaBoost.MH. To evaluate LDA-AdaBoost.MH, the following four datasets have been used: Reuters-21578-ModApte, WebKB, 20-Newsgroups and a collection of Arabic news. The experimental results confirmed that representing the texts as a small number of latent topics, rather than a large number of words, significantly decreased the computational time of AdaBoost.MH learning and improved its performance for text categorization.
机译:AdaBoost.MH是一种增强算法,被认为是用于多标签分类的最准确算法之一。它通过迭代地建立一个由决策树的弱假设组成的委员会来工作。为了建立弱假设,在每次迭代中,AdaBoost.MH都会获取提取的全部特征,并逐个检查它们,以检查它们表征适当类别的能力。使用词袋进行文本表示会大大增加AdaBoost.MH学习的计算时间,尤其是对于大型数据集。在本文中,我们演示了如何使用潜在主题而不是单词来提高AdaBoost.MH的效率和有效性。众所周知的概率主题建模方法Latent Dirichlet Allocation用于估计语料库中作为AdaBoost.MH的特征的潜在主题。若要评估LDA-AdaBoost.MH,使用了以下四个数据集:Reuters-21578-ModApte,WebKB,20-新闻组和阿拉伯新闻集。实验结果证实,将文本表示为少量潜在主题而不是大量单词,可以显着减少AdaBoost.MH学习的计算时间,并提高其在文本分类中的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号