Automatic Document Clustering Based on Keyword Clusters Using Partitions of Weighted Digraphs

机译：基于Weighted Digraph的分区的关键字群集的自动文档群集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a new document clustering approach from the viewpoint of partitions of weighted directional graphs (digraph). First, natural language processing and feature selection techniques are utilized to remove the words that are useless for document clustering. Then, only useful keywords are extracted and the association strengths between them are computed, which can greatly reduce time and space complexities of the clustering algorithm. After that, the extracted keywords are treated as the nodes and the association strengths are used as the weights in the arcs from some keywords to their associated ones. Therefore, a weighted digraph is constructed. The strongly connected components in the keyword digraph are explored heuristically. These components represent the keyword clusters of the document collection. Based on the keyword clusters, each document is clustered according to the similarity of the keywords between the document and each of the keyword clusters. It is revealed from the experiments that using keyword clusters in automatic document clustering can result in high clustering precision rate.

机译：本文从加权定向图（Digraph）分区的角度提出了一种新的文档聚类方法。首先，利用自然语言处理和特征选择技术来删除对文档群集无用的单词。然后，仅提取有用的关键字，计算它们之间的关联强度，这可以大大降低聚类算法的时间和空间复杂性。之后，提取的关键字被视为节点，并且关联强度用作来自一些关键字的弧中的权重到其相关联的问题。因此，构建了加权的数字。关键字数字中的强大连接的组件在启发式上探索。这些组件表示文档集合的关键字群集。基于关键字群集，根据文档与每个关键字群集之间的关键字的相似性群集每个文档。从实验中透露，在自动文档聚类中使用关键字群集可能会导致高簇精度速率。

著录项

来源
《International conference on information and knowledge engineering》|2003年||共6页
会议地点
作者
Hsi-Cheng Chang; Chiun-Chieh Hsu; Chi-Kai Chan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与传播理论;
关键词
document clustering; information retrieval; weighted diagraph;

机译：文档聚类;信息检索;加权变解记录;

相似文献

外文文献
中文文献
专利

1. Automatic document clustering based on keyword clusters using partitions of weighted diagraphs [J] . Hsi-Cheng Chang, Chiun-Chieh Hsu, Chi-Kai Chan International Journal of Computer Systems Science & Engineering . 2004,第1期

机译：使用加权有向图的分区基于关键字聚类的自动文档聚类
2. Using Topic Keyword Clusters for Automatic Document Clustering [J] . Hsi-Cheng CHANG, Chiun-Chieh HSU IEICE Transactions on Information and Systems . 2005,第8期

机译：使用主题关键字聚类进行自动文档聚类
3. Comparison Of Keyword Based Clustering Of Web Documents By Using Openstack 4j And By Traditional Method [J] . Shiza Anand, Dr. Mukesh Rawat International Journal of Scientific & Technology Research . 2016,第8期

机译：使用Openstack 4j和传统方法的基于关键词的Web文档聚类比较
4. Automatic Document Clustering Based on Keyword Clusters Using Partitions of Weighted Digraphs [C] . Hsi-Cheng Chang, Chiun-Chieh Hsu, Chi-Kai Chan Proceedings of the International Conference on Information and Knowledge Engineering(IKE'03) . 2003

机译：基于关键词聚类的加权图分区自动文档聚类
5. Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters. [D] . Li, Xiaoxiao. 2012

机译：文本文档主题递归群集和文档群集层次结构的自动标记。
6. Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms [O] . Etienne Lord, Abdoulaye Baniré Diallo, Vladimir Makarenkov 2015

机译：使用分区和分层聚类算法的加权版本对生物信息学工作流进行分类
7. DOCUMENT CLUSTERING USING AGGLOMERATIVE HIERARCHICAL CLUSTERING APPROACH (AHDC) AND PROPOSED TSG KEYWORD EXTRACTION METHOD [O] . R. Nagarajan . 2016

机译：使用聚焦分层聚类方法（AHDC）和提出的TSG关键字提取方法的文档聚类
8. Soft Clustering Criterion Functions for Partitional Document Clustering [R] . Zhao, Y. , Karypis, G. 2004

机译：分区文档聚类的软聚类判据函数

Automatic Document Clustering Based on Keyword Clusters Using Partitions of Weighted Digraphs

摘要

著录项

相似文献

相关主题

期刊订阅