首页> 中文期刊> 《计算机工程与科学》 >词共现文本主题聚类算法

词共现文本主题聚类算法

         

摘要

Text topic is the key of text clustering,the co-occurrence words are very strong to express document theme in document.On the basis of studying the existing text subject mining and the extraction algorithm of word co-occurrence,this paper proposed a topic text clustering algorithm based on association rules and word co-occurrence.Firstly the algorithm extracts co-occurrence words of document by association rule mining algorithm.Secondly,according to the co-occurrence word,the similarity measure of co-occurrence word pairs was implemented.Finally it uses the hierarchical clustering algorithm to finish the document clustering.Experimental results show that the hierarchical clustering algorithm based on word co-occurrence can not only greatly reduce high dimension of text vector and the algorithm complexity,but also significantly improves the efficiency and accuracy of text clustering,in comparison to other algorithms,and it can also produce the clustering effect of good quality.%文本主题是文本聚类的关键,而文档中共现词对对文档主题的表现力非常强.因此,在对现有文本主题挖掘和共现词对抽取算法深入研究的基础上,提出了一种基于关联规则词共现的文本主题聚类算法(TCABARWC),即首先采用关联规则挖掘算法抽取文档共现词对,利用词共现提取文本主题信息,然后根据共现词对建模并实现共现词对相似度量,最后结合层次聚类算法实现文本聚类.实验结果表明,相比其他聚类算法,基于关联规则共现词对的层次聚类算法,大大降低了文本向量的维度以及算法复杂度,在聚类效率和准确性上都有显著提高,并获得了较好的聚类效果.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号