首页> 外文会议>2016 International Conference on Information Systems and Artificial Intelligence >Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence
【24h】

Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence

机译:基于语义词共现图结构的文本聚类算法

获取原文
获取原文并翻译 | 示例

摘要

Text theme is the key of text clustering, while the co-occurrence words can be very stronger to express text theme in document. This paper proposes a text clustering algorithm based on the text semantic representation and the graph structure of word co-occurrence on the basis of in-depth studying text theme mining and word co-occurrence. First, the algorithm constructs the text graph-structure according to the co-occurrence of feature words. In other words, it uses the graph structure to represent all texts. Then, it adopts the maximum common sub-graph between two texts to calculate their similarity and combines with K-means clustering algorithm to realize the document clustering. The compared experimental results with hierarchical clustering algorithm show the K-means clustering algorithm based on the graph structures of word co-occurrence greatly reduce the high dimension of text vector and the algorithm complexity, significantly improves the efficiency and accuracy of text clustering, and it can also produce the clustering effect of good quality.
机译:文本主题是文本聚类的关键,而共现词可以更强大地表达文档中的文本主题。在深入研究文本主题挖掘和词共现的基础上,提出了一种基于文本语义表示和词共现图结构的文本聚类算法。首先,该算法根据特征词的共现来构造文本图结构。换句话说,它使用图结构表示所有文本。然后,采用两个文本之间最大的公共子图计算相似度,并结合K-means聚类算法实现文档聚类。与分层聚类算法的比较实验结果表明,基于词共现的图结构的K-means聚类算法大大降低了文本向量的高维和算法复杂度,显着提高了文本聚类的效率和准确性,还可以产生高质量的聚类效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号