Document Clustering Method Based on Frequent Co-occurring Words

机译：基于频繁共现词的文档聚类方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a new document clustering method based on frequent co-occurring words. We first employ the Singular Value Decomposition, and then group the words into clusters called word representatives as substitution of the corresponding words in the original documents. Next, we extract the frequent word representative sets by Apriori. Subsequently, each document is designated to a basic unit described by the frequent word representative set, from which we can get the ultimate clusters by hierarchical clustering. The major advantage of our method is that it can produce the cluster description by the frequent word representatives and then by the corresponding words in the clustering process without any extra works. Compared with the state-of-the-art UPGMA method on benchmark datasets, our method has better performance in terms of the entropy and cluster purity.

机译：本文提出了一种基于频繁共现词的新文档聚类方法。我们首先采用奇异值分解，然后将单词分组到称为单词代表的簇中，以替换原始文档中的相应单词。接下来，我们通过Apriori提取常用单词代表集。随后，将每个文档指定到由频繁单词代表集描述的基本单元，从中我们可以通过层次聚类获得最终聚类。我们的方法的主要优点是，它可以在聚类过程中由频繁的单词代表生成词簇描述，然后由相应的词生成聚类描述，而无需任何额外的工作。与基准数据集上最新的UPGMA方法相比，我们的方法在熵和聚类纯度方面具有更好的性能。

著录项

来源
《Pacific Asia Conference on Language, Information and Computation; 20061101-03; Wuhan(CN)》|2006年|P.442-445|共4页
会议地点 Wuhan(CN)
作者
Ye-Hang Zhu; Guan-Zhong Dai; Benjamin C. M. Fung; De-Jun Mu;
展开▼
作者单位

College of Automation, Northwestern Polytechnical University, Xi'an 710072, China;

School of Computing Science, Simon Fraser University, BC, Canada, V5A 1S6;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
document clustering; text clustering; frequent itemsets; apriori;

机译：文档聚类;文本聚类;频繁项集;先验;
入库时间 2022-08-26 14:20:59

相似文献

外文文献
中文文献
专利

1. Text document clustering based on frequent word meaning sequences [J] . Yanjun Li, Soon M. Chung, John D. Holt Data & Knowledge Engineering . 2008,第1期

机译：基于频繁词义序列的文本文档聚类
2. An Approach to Improve Quality of Document Clustering by Word Set Based Documenting Clustering Algorithm [J] . Sandeep Sharma, Ruchi Dave, Naveen Hemrajani Oriental journal of computer science and technology . 2011,第2期

机译：基于词集的文档聚类算法提高文档聚类质量的方法
3. Text Document Retrieval through Clustering using Meaningful Frequent Ordered Word Patterns [J] . Pushpalatha K. P., G. Raju International Journal of Applied Engineering Research . 2018,第7aPta2期

机译：通过使用有意义的频繁有序的单词模式来通过聚类来检索文本文档
4. Document Clustering Method Based on Frequent Co-occurring Words [C] . Ye-Hang Zhu, Guan-Zhong Dai, Benjamin C. M. Fung, Pacific Asia Conference on Language, Information and Computation . 2006

机译：基于频繁共同出现词的文档聚类方法
5. Clustering Web documents: A phrase-based method for grouping search engine results. [D] . Zamir, Oren Eli. 1999

机译：Web文档群集：一种基于短语的方法，用于对搜索引擎结果进行分组。
6. Document vectorization method using network information of words [O] . Sang Yup Lee 2012

机译：利用单词网络信息的文档矢量化方法
7. Document Clustering Method Based on Frequent Co-occurring Words [O] . Zhu Ye-Hang, Dai Guan-Zhong, Fung Benjamin C. M., 2006

机译：基于频繁共现词的文档聚类方法

Document Clustering Method Based on Frequent Co-occurring Words

摘要

著录项

相似文献

相关主题

期刊订阅