...
首页> 外文期刊>Aslib journal of information management >Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems
【24h】

Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems

机译:基于文档的方法可提高评估信息检索系统中成对比较的准确性

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose - The purpose of this paper is to propose a method to have more accurate results in comparing performance of the paired information retrieval (IR) systems with reference to the current method, which is based on the mean effectiveness scores of the systems across a set of identified topics/queries. Design/methodology/approach - Based on the proposed approach, instead of the classic method of using a set of topic scores, the documents level scores are considered as the evaluation unit These document scores are the defined document's weight, which play the role of the mean average precision (MAP) score of the systems as a significance test's statics. The experiments were conducted using the TREC 9 Web track collection. Findings - The p-values generated through the two types of significance tests, namely the Student's t-test and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference between IR systems is more significant compared with utilizing topic scores. Originality/value - Utilizing a suitable test collection is a primary prerequisite for IR systems comparative evaluation. However, in addition to reusable test collections, having an accurate statistical testing is a necessity for these evaluations. The findings of this study will assist IR researchers to evaluate their retrieval systems and algorithms more accurately.
机译:目的-本文的目的是提出一种方法,以基于当前系统中整个系统的平均有效性得分,在比较配对信息检索(IR)系统的性能时参考当前方法,从而获得更准确的结果确定的主题/查询。设计/方法/方法-基于建议的方法,而不是使用一组主题评分的经典方法,将文档级别评分视为评估单位。这些文档评分是定义的文档的权重,发挥文档的作用。系统的平均平均精度(MAP)得分,作为显着性检验的静态指标。实验是使用TREC 9网络曲目集进行的。结果-通过学生的t检验和曼·惠特尼这两种类型的显着性检验生成的p值表明,通过使用文档等级得分作为评估单位,与利用主题相比,IR系统之间的差异更为显着分数。原创性/价值-利用合适的测试集是IR系统比较评估的主要前提。但是,除了可重复使用的测试集合外,进行这些评估还必须具有准确的统计测试。这项研究的结果将帮助IR研究人员更准确地评估他们的检索系统和算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号