首页> 中文期刊> 《计算机工程与设计》 >基于LDA主题模型的短文本分类

基于LDA主题模型的短文本分类

         

摘要

针对传统VSM(vector space model)在短文本分类中维数高、语义特征不明显的问题,提出基于LDA(latent Dirichlet allocation)模型主题分布相似度分类方法;针对短文本内容少、长度短、特征稀疏的问题,提出基于LDA模型主题-词分布矩阵的主题分布向量改进方法.与传统VSM分类方法相比,该方法降低了相似度计算维度,融合了一定语义特征.实验结果表明,与传统VSM分类方法相比,基于主题分布相似度方法的平均F1值提高了4.5%,基于LDA模型主题-词分布矩阵主题分布向量改进方法的平均F1值提高了5.2%,验证了以上方法的有效性.%In view of the problems of high dimension and less obvious semantics in short text classification using VSM (vector space model),a classification method of topic distribution similarity based on LDA (latent Dirichlet allocation)model was pro-posed.In view of characteristics of less content,short text and sparse feature,a classification method of improved topic distribu-tion similarity based on topic-word distribution in LDA model was proposed.Compared with traditional classification method VSM,the proposed method of topic distribution similarity increases the average F1 measure by 4.5% and the modified topic dis-tribution similarity method based on topic-word distribution in LDA increases the average F1 measure by 5.2%,which verify their effectiveness.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号