Clustering Documents based on Semantic Similarity using HAC and K-Mean Algorithms

机译：使用HAC和K-MEAL算法基于语义相似性的聚类文档

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The continuing success of the Internet has greatly increased the number of text documents in electronic formats. The techniques for grouping these documents into meaningful collections have become mission-critical. The traditional method of compiling documents based on statistical features and grouping did use syntactic rather than semantic. This article introduces a new method for grouping documents based on semantic similarity. This process is accomplished by identifying document summaries from Wikipedia and IMDB datasets, then deriving them using the NLTK dictionary. A vector space afterward is modeled with TFIDF, and the clustering is performed using the HAC and K-mean algorithms. The results are compared and visualized as an interactive webpage.

机译：互联网的持续成功大大增加了电子格式的文本文件的数量。将这些文档分组到有意义的收集的技术已经成为关键任务。基于统计特征和分组的传统编译文档的方法确实使用了句法而不是语义。本文介绍了一种基于语义相似性分组文档的新方法。该过程是通过识别Wikipedia和IMDB数据集的文件摘要来完成的，然后使用NLTK字典派生它们。之后的矢量空间用TFIDF进行建模，使用HAC和K平均算法进行群集。将结果进行比较和可视化为交互式网页。

著录项

来源
《International Conference on Advanced Science and Engineering》|2020年|205-210|共6页
会议地点
作者
Karwan Jacksi; Rowaida Kh. Ibrahim; Subhi R. M. Zeebaree; Rizgar R. Zebari; Mohammed A. M. Sadeeq;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Measurement; Electronic publishing; Visualization; Semantics; Mission critical systems; Clustering algorithms; Encyclopedias;

机译：测量;电子出版;可视化;语义;关键任务系统;聚类算法;百科全书;

相似文献

外文文献
中文文献
专利

1. MLK-Means - A Hybrid Machine Learning based K-Means Clustering Algorithms for Document Clustering [J] . P. PERUMAL, R. NEDUNCHEZHIAN WSEAS Transactions on Information Science and Applications . 2012,第7a9期

机译：MLK-Means-用于文档聚类的基于混合机器学习的K-Means聚类算法
2. An Extensive Study of Similarity and Dissimilarity Measures Used for Text Document Clustering using K-means Algorithm [J] . Maedeh Afzali, Suresh Kumar International Journal of Information Technology and Computer Science . 2018,第9期

机译：基于K-means算法的文本文档聚类中相似度和相异度度量的广泛研究
3. The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization [J] . Ida Bagus Gede Sarasvananda, Retantyo Wardoyo, Anny Kartika Sari Indonesian Journal of Computing and Cybernetics Systems . 2019,第4期

机译：K-means聚类算法具有语义相似性，以估计住院费用
4. Semantic Document Clustering using K-means algorithm and Ward's Method [C] . Niyaz M. Salih, Karwan Jacksi International Conference on Advanced Science and Engineering . 2020

机译：使用K-Means算法和Ward方法进行语义文档聚类
5. Study of document clustering using the k-means algorithm. [D] . Gummuluru, Meghna Sharma. 2006

机译：使用k-means算法研究文档聚类。
6. CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques [O] . Yun Zuo, Jianyuan Lin, Xiangxiang Zeng, 2021

机译：Carsite-II：一种基于K-Means相似性的欠采样和合成少数群体过采样技术鉴定羰基化位点的综合分类算法
7. Document Clustering Using K-Means with Term Weighting as Similarity-Based Constraints [O] . Uraiwan Buatoom, Waree Kongprawechnon, Thanaruk Theeramunkong 2020

机译：使用K-Means的文档群集具有术语加权作为基于相似性的约束

Clustering Documents based on Semantic Similarity using HAC and K-Mean Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅