首页> 外文会议>Chinese Control Conference >Multi-LDA hybrid topic model with boosting strategy and its application in text classification
【24h】

Multi-LDA hybrid topic model with boosting strategy and its application in text classification

机译:具有促进策略的多LDA混合主题模型及其在文本分类中的应用

获取原文

摘要

Topic modeling, especially Latent Dirichlet Allocation is an efficacious algorithm for feature selection and dimension reduction in text categorization tasks. Unlike the traditional Vector Space Model, LDA can easily overcome the curse of dimensionality and feature sparse problems. With the mapping from word space to the topic space, there are more benefits, but at the same time, the determination of model parameters turn into a new trouble. This article proposed a novel classification algorithm that combined different models with different parameters together via boosting strategy. Moreover, Naïve Bayes and Support Vector Machine are employed as weak classifier and a weighted method is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more ration way. Experiment results show our method well perform both in accuracy and generalization.
机译:主题建模,尤其是潜在Dirichlet分配,是一种有效的算法,可用于文本分类任务中的特征选择和降维。与传统的矢量空间模型不同,LDA可以轻松克服维数的诅咒和特征稀疏的问题。通过从单词空间到主题空间的映射,可以获得更多好处,但是同时,确定模型参数也带来了新的麻烦。本文提出了一种新颖的分类算法,该算法通过提升策略将具有不同参数的不同模型组合在一起。此外,采用朴素贝叶斯和支持向量机作为弱分类器,并提出了一种加权方法,通过以较合理的方式将弱分类器整合到强分类器中来提高准确性。实验结果表明我们的方法在准确性和泛化方面均表现良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号