首页> 外文会议>International conference on information and knowledge engineering >Automatic Document Clustering Based on Keyword Clusters Using Partitions of Weighted Digraphs
【24h】

Automatic Document Clustering Based on Keyword Clusters Using Partitions of Weighted Digraphs

机译:基于Weighted Digraph的分区的关键字群集的自动文档群集

获取原文

摘要

This paper proposes a new document clustering approach from the viewpoint of partitions of weighted directional graphs (digraph). First, natural language processing and feature selection techniques are utilized to remove the words that are useless for document clustering. Then, only useful keywords are extracted and the association strengths between them are computed, which can greatly reduce time and space complexities of the clustering algorithm. After that, the extracted keywords are treated as the nodes and the association strengths are used as the weights in the arcs from some keywords to their associated ones. Therefore, a weighted digraph is constructed. The strongly connected components in the keyword digraph are explored heuristically. These components represent the keyword clusters of the document collection. Based on the keyword clusters, each document is clustered according to the similarity of the keywords between the document and each of the keyword clusters. It is revealed from the experiments that using keyword clusters in automatic document clustering can result in high clustering precision rate.
机译:本文从加权定向图(Digraph)分区的角度提出了一种新的文档聚类方法。首先,利用自然语言处理和特征选择技术来删除对文档群集无用的单词。然后,仅提取有用的关键字,计算它们之间的关联强度,这可以大大降低聚类算法的时间和空间复杂性。之后,提取的关键字被视为节点,并且关联强度用作来自一些关键字的弧中的权重到其相关联的问题。因此,构建了加权的数字。关键字数字中的强大连接的组件在启发式上探索。这些组件表示文档集合的关键字群集。基于关键字群集,根据文档与每个关键字群集之间的关键字的相似性群集每个文档。从实验中透露,在自动文档聚类中使用关键字群集可能会导致高簇精度速率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号