【24h】

Document Clustering Method Based on Frequent Co-occurring Words

机译:基于频繁共现词的文档聚类方法

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a new document clustering method based on frequent co-occurring words. We first employ the Singular Value Decomposition, and then group the words into clusters called word representatives as substitution of the corresponding words in the original documents. Next, we extract the frequent word representative sets by Apriori. Subsequently, each document is designated to a basic unit described by the frequent word representative set, from which we can get the ultimate clusters by hierarchical clustering. The major advantage of our method is that it can produce the cluster description by the frequent word representatives and then by the corresponding words in the clustering process without any extra works. Compared with the state-of-the-art UPGMA method on benchmark datasets, our method has better performance in terms of the entropy and cluster purity.
机译:本文提出了一种基于频繁共现词的新文档聚类方法。我们首先采用奇异值分解,然后将单词分组到称为单词代表的簇中,以替换原始文档中的相应单词。接下来,我们通过Apriori提取常用单词代表集。随后,将每个文档指定到由频繁单词代表集描述的基本单元,从中我们可以通过层次聚类获得最终聚类。我们的方法的主要优点是,它可以在聚类过程中由频繁的单词代表生成词簇描述,然后由相应的词生成聚类描述,而无需任何额外的工作。与基准数据集上最新的UPGMA方法相比,我们的方法在熵和聚类纯度方面具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号