WordNet-Based and N-Grams-Based Document Clustering: A Comparative Study

机译：基于Wordnet和基于N-GRAM的文档聚类：比较研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A great number of methods of unsupervised classifications also called clustering were applied to the textual documents. In this paper, we initially propose the method of the self-organizing maps of Kohonen for the clustering of the textual documents based on the n-grams representation. The same method based on the synsets of WordNet as terms for the representation of the textual documents will be studied thereafter. The effects of these methods are examined in several experiments using 4 measurements of similarity: the Cosine distance, the Euclidean distance, the Squared Euclidean distance and the Manhattan distance. The reuters-21578 corpus is used for evaluation. The evaluation was done, by using the F-measure and the entropy. The results obtained show that in spite of the good results obtained by the method of the n-grams, the fact of adding lexical knowledge in the representation makes it possible to build a better classification.

机译：许多未经监督的分类方法也被称为群集应用于文本文档。在本文中，我们最初提出了基于N-GRAM表示的文本文档的聚类自组织kohonen的自组织地图的方法。此后将研究基于Wordnet的Synpsets的相同方法作为文本文档的表示的术语。使用4次相似性测量的几个实验中检查了这些方法的效果：余弦距离，欧几里德距离，平方欧几里德距离和曼哈顿距离。 Reuters-21578语料库用于评估。通过使用F测量和熵进行评估。结果表明，尽管通过N-GRAM的方法获得的良好结果，但在代表中添加词汇知识的事实使得可以建立更好的分类。

著录项

来源
《International Conference on Broadband Communications, Informatics and Biomedical Applications》|2008年||共8页
会议地点
作者
Amine Abdelmalek; Elberrichi Zakaria; Simonet Michel; Malki Mimoun;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN91-53;
关键词
Document clustering; WordNet; n-grams; reuters-21578; self-organizing maps of Kohonen; similarity;

机译：文档聚类;Wordnet;N-Grams;Reuters-21578;自组织的Kohonen地图;相似之处;

相似文献

外文文献
中文文献
专利

1. Feature selection methods for document clustering: a comparative study and a hybrid solution [J] . Asmaa Benghabrit, Brahim Ouhbi, Bouchra Frikh, International journal of data analysis techniques and strategies . 2019,第3期

机译：文档聚类的特征选择方法：比较研究和混合解决方案
2. Comparative Study of Clustering Algorithms using OverallSimSUX Similarity Function for XML Documents [J] . Damny Magdaleno, Yadriel Miranda, Ivett E. Fuentes, Inteligencia Artificial : Ibero-American Journal of Artificial Intelligence . 2015,第55期

机译：使用TotalSimSUX相似性函数处理XML文档的聚类算法的比较研究
3. A Comparative Study to Find a Suitable Method for Text Document Clustering [J] . S.C.Punitha, M.Punithavalli International journal of computer science and network security . 2012,第10期

机译：寻找适合文本文档聚类方法的比较研究
4. WordNet-Based and N-Grams-Based Document Clustering: A Comparative Study [C] . Amine Abdelmalek, Elberrichi Zakaria, Simonet Michel, International Conference on Broadband Communications, Informatics and Biomedical Applications . 2008

机译：基于Wordnet和基于N-GRAM的文档聚类：比较研究
5. A comparative study on ontology generation and text clustering using VSM, LSI, and document ontology models. [D] . Taylor, William P., II. 2007

机译：使用VSM，LSI和文档本体模型进行本体生成和文本聚类的比较研究。
6. Document Clustering of Clinical Narratives: a Systematic Study of Clinical Sublanguages [O] . Olga Patterson, John F. Hurdle 2011

机译：临床叙述的文档聚类：临床亚语言的系统研究
7. WordNet-based Text Document Clustering [O] . Julian Sedding, Dimitar Kazakov 2010

机译：基于WordNet的文本文档聚类

WordNet-Based and N-Grams-Based Document Clustering: A Comparative Study

摘要

著录项

相似文献

相关主题

期刊订阅