首页> 中文期刊>计算机工程与科学 >文本分类中CTM模型的优化和可视化应用研究

文本分类中CTM模型的优化和可视化应用研究

     

摘要

How to automatically extract related information from enormous texts has become a huge challenge.As an efficient way to solve this problem,text classification has attracted much attention,in which text representation is a critical factor to affect classification results.The correlated topic model can implement text representation,which can correctly reflect the correlation between topics under the case to remain the integrity of information.Based on this model,we optimize feature selection and the number of topics,and determine the number of topics with perplexity and log-likelihood function.We adopt the principal component analysis algorithm based on mutual information to optimize feature selection,which can reduce data dimension and the redundancy of text features.The R language is used to visualize the experimental results.%如何从海量文本中自动提取相关信息已成为巨大的技术挑战,文本分类作为解决该问题的重要方法已引起广大关注,而其中文本表示是影响分类效果的关键因素.为此采用相关主题模型进行文本表示,以保证信息完整同时表现主题相关性;基于该模型,对主题数目和特征提取实施了优化处理,综合复杂度和对数似然函数来确定最优主题数目,引入基于互信息的主成分分析算法进行最优特征提取,降低数据维度和特征冗余,使用R语言进行可视化实验分析.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号