首页> 外文学位 >The effectiveness and efficiency of clustering in Arabic information retrieval systems.
【24h】

The effectiveness and efficiency of clustering in Arabic information retrieval systems.

机译:阿拉伯信息检索系统中聚类的有效性和效率。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation explores several different approaches to clustering documents: complete-link clustering, group-average clustering, and single-link clustering. A series of experiments in information retrieval were carried out on two different corpora: a collection of 242 abstracts of gapers in computer science and a newspaper corpus of 187 articles of varying length. Each clustering method was tested three times, once using words as index terms, once using stems, once using roots.; We experimented with the use of roots, stems, and full words as index terms using the complete link clustering method. The retrieval results of these experiments revealed that using full words as index results in significantly better performance than using roots as index terms. And using roots produces significantly better results than using stems. Using the group average link clustering method we found that using full words as index terms gives significantly better results than using roots as index terms. Also, using roots as index terms gives significantly better results than using stems except at the recall level of 1.0. Using the single link clustering method we found that using full words as index terms produces significantly better results at the lower recall levels (up to 0.4) than using roots, and significantly better than using stems at the lower recall levels (up to 0.6). But, at the higher recall levels roots and stems perform significantly better than full words.
机译:本文探讨了几种不同的文档聚类方法:完全链接聚类,组平均聚类和单链接聚类。在两个不同的语料库上进行了一系列的信息检索实验:计算机科学领域的242个摘要摘要的收集和187种不同长度的报纸语料库。每种聚类方法进行了三次测试,一次使用单词作为索引词,一次使用词干,一次使用词根。我们使用完整的链接聚类方法尝试使用词根,词干和完整词作为索引词。这些实验的检索结果表明,与将词根用作索引词相比,使用完整词作为索引可显着提高性能。与使用茎相比,使用根产生的结果明显更好。使用组平均链接聚类方法,我们发现使用完整的单词作为索引项比使用根作为索引项具有明显更好的结果。同样,使用根作为索引项比使用词干具有明显更好的结果,除非召回级别为1.0。使用单链接聚类方法,我们发现使用全字作为索引词在较低的召回级别(最高0.4)下比使用词根会产生明显更好的结果,并且在较低的召回级别(最高0.6)上比使用词干产生明显更好的结果。但是,在较高的回忆级别上,词根和词干的表现要比完整词好得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号