首页> 外文期刊>International journal of semantic computing >Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph
【24h】

Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph

机译:使用相似度图微调语义文档聚类算法

获取原文
获取原文并翻译 | 示例
           

摘要

In this article, we examine an algorithm for document clustering using a similarity graph. The graph stores words and common phrases from the English language as nodes and it can be used to compute the degree of semantic similarity between any two phrases. One application of the similarity graph is semantic document clustering, that is, grouping documents based on the meaning of the words in them. Since our algorithm for semantic document clustering relies on multiple parameters, we examine how fine-tuning these values affects the quality of the result. Specifically, we use the Reuters-21578 benchmark, which contains 11; 362 newswire stories that are grouped in 82 categories using human judgment. We apply the k-means clustering algorithm to group the documents using a similarity metric that is based on keywords matching and one that uses the similarity graph. We evaluate the results of the clustering algorithms using multiple metrics, such as precision, recall, f-score, entropy, and purity.
机译:在本文中,我们研究了一种使用相似度图进行文档聚类的算法。该图将英语中的单词和常用短语作为节点存储,可以用来计算任意两个短语之间的语义相似度。相似度图的一种应用是语义文档聚类,即基于文档中单词的含义对文档进行分组。由于我们的语义文档聚类算法依赖于多个参数,因此我们研究了微调这些值如何影响结果的质量。具体来说,我们使用Reuters-21578基准,该基准包含11个; 362个新闻专线故事采用人工判断分为82类。我们应用k均值聚类算法使用基于关键字匹配的相似度度量和使用相似度图的相似度度量对文档进行分组。我们使用多个指标(例如精度,召回率,f分数,熵和纯度)评估聚类算法的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号