首页> 中文期刊> 《计算机应用》 >基于LDA主题模型的标签传递算法

基于LDA主题模型的标签传递算法

         

摘要

标签传递算法是一种半监督分类方法,由于该算法存在要求数据分类结果符合流行假设、数据维数较高时计算复杂度高等问题,在文本分类中效果较差.针对这些问题,经过对LDA主题模型和标签传递算法原理及复杂度的分析,将两者结合,提出一种基于LDA主题模型的标签传递算法LPLDA.该算法用LDA主题模型中的主题表示文本数据,一方面使用LDA主题模型表示文本保证分类结果符合流行假设,另一方面有效减少标签传递算法相似度计算时间.经过实验证明,该算法在标记数据少于待测样本时,分类效果优于传统的有监督分类方法.%Label Propagation (UP) algorithm is one kind of semi-supervised learning methods. However, its performance in text classification is not good enough, because LP algorithm demands manifold assumption and it has high computational complexity in calculating the similarity of high dimension data. A new method was proposed to combine Latent Dirichlet Allocation (LDA) model with LP algorithm to solve the above problems after analyzing their principles and complexities. It represented documents with latent topics in LDA. On one hand, it reduces the dimension of matrixes; on the other hand, it can help LDA model lead to the classification results with manifold assumption. The experimental results show that the new method performs better than traditional supervised text classification methods in testing sets when labeled data is less than unlabeled data.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号