Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph

Lubomir Stanchev

首页> 外文期刊>International journal of semantic computing >Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph

【24h】

Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph

机译：使用相似度图微调语义文档聚类算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this article, we examine an algorithm for document clustering using a similarity graph. The graph stores words and common phrases from the English language as nodes and it can be used to compute the degree of semantic similarity between any two phrases. One application of the similarity graph is semantic document clustering, that is, grouping documents based on the meaning of the words in them. Since our algorithm for semantic document clustering relies on multiple parameters, we examine how fine-tuning these values affects the quality of the result. Specifically, we use the Reuters-21578 benchmark, which contains 11; 362 newswire stories that are grouped in 82 categories using human judgment. We apply the k-means clustering algorithm to group the documents using a similarity metric that is based on keywords matching and one that uses the similarity graph. We evaluate the results of the clustering algorithms using multiple metrics, such as precision, recall, f-score, entropy, and purity.

机译：在本文中，我们研究了一种使用相似度图进行文档聚类的算法。该图将英语中的单词和常用短语作为节点存储，可以用来计算任意两个短语之间的语义相似度。相似度图的一种应用是语义文档聚类，即基于文档中单词的含义对文档进行分组。由于我们的语义文档聚类算法依赖于多个参数，因此我们研究了微调这些值如何影响结果的质量。具体来说，我们使用Reuters-21578基准，该基准包含11个; 362个新闻专线故事采用人工判断分为82类。我们应用k均值聚类算法使用基于关键字匹配的相似度度量和使用相似度图的相似度度量对文档进行分组。我们使用多个指标（例如精度，召回率，f分数，熵和纯度）评估聚类算法的结果。

著录项

来源
《International journal of semantic computing》 |2016年第4期|共1页
作者
Lubomir Stanchev;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Semantic search; semantic graph; document clustering;

机译：语义搜索;语义图;文档聚类;

相似文献

外文文献
中文文献
专利

1. Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph [J] . Lubomir Stanchev International journal of semantic computing . 2016,第4期

机译：使用相似度图微调语义文档聚类算法
2. An Efficient Approach for Ranking of Semantic Web Documents by Computing Semantic Similarity and Using HCS Clustering [J] . Poonam Chahal, Manjeet Singh International journal of signs and semiotic systems . 2021,第1期

机译：通过计算语义相似性和使用HCS群集来进行语义Web文档的高效方法
3. Survey on Semantic Similarity Based on Document Clustering [J] . Rowaida Khalil Ibrahim, Subhi Rafeeq Mohammed Zeebaree, Karwan Fahmi Sami Jacksi Advances in Science, Technology and Engineering Systems . 2019,第5期

机译：基于文档聚类的语义相似度调查
4. Clustering Documents based on Semantic Similarity using HAC and K-Mean Algorithms [C] . Karwan Jacksi, Rowaida Kh. Ibrahim, Subhi R. M. Zeebaree, International Conference on Advanced Science and Engineering . 2020

机译：使用HAC和K-MEAL算法基于语义相似性的聚类文档
5. Incorporating semantic and syntactic information into document representation for document clustering. [D] . Wang, Yong. 2005

机译：将语义和句法信息合并到文档表示中以进行文档聚类。
6. Bridging the gap: incorporating a semantic similarity measure for effectively mapping PubMed queries to documents [O] . Sun Kim, Nicolas Fiorini, W. John Wilbur, -1

机译：缩小差距：纳入语义相似性度量以有效将PubMed查询映射到文档
7. Semantic Document Clustering Using a Similarity Graph [O] . Lubomir Stanchev 2016

机译：使用相似性图形的语义文档聚类

Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph

摘要

著录项

相似文献

相关主题

期刊订阅