【24h】

Metrics for information retrieval: A case study

机译:信息检索指标:案例研究

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The domain of information retrieval (IR)has used clustering methods in a big way. Clustering is a technique that groups a set of documents into clusters or subsets. How efficiently and effectively the relevant documents are extracted from World Wide Web is a challenging issue. In this work, we compare and analyse the effectiveness of similarity measures such as City Block distance, Cosine similarity, Point symmetry distance and Dicecoefficient to improve document clustering with and without the presence of ontology. This has two objectives: a comparison of metrics in the domain and study the impact of various methods like ontology comparison and clustering on the metrics as a whole. This will lead to further refinement of the metrics for current and future needs in the domain. Earlier works in the domain have highlighted the fact that the results of the similarity measures are more or less the same. However our work shows that the use of ontology based clustering marked changes in the results. The results show the need for more work to be focused on the metrics aspect in information retrieval.
机译:信息检索(IR)领域在很大程度上使用了聚类方法。聚类是将一组文档分为聚类或子集的技术。如何有效地从万维网提取相关文档是一个具有挑战性的问题。在这项工作中,我们比较和分析了诸如城市街区距离,余弦相似度,点对称距离和Dicecoefficient之类的相似性度量的有效性,以改善有无本体存在下的文档聚类。这有两个目标:比较域中的指标,研究本体比较和聚类等各种方法对指标整体的影响。这将导致对该领域当前和未来需求的度量标准的进一步完善。该领域的较早作品强调了一个事实,即相似性度量的结果或多或少都是相同的。但是,我们的工作表明,使用基于本体的聚类标记了结果的变化。结果表明需要更多的工作集中在信息检索的指标方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号