首页> 外文期刊>Information Processing & Management >Effectiveness evaluation without human relevance judgments: A systematic analysis of existing methods and of their combinations
【24h】

Effectiveness evaluation without human relevance judgments: A systematic analysis of existing methods and of their combinations

机译:没有人为相关判断的有效评估:对现有方法和它们的组合进行系统分析

获取原文
获取原文并翻译 | 示例
       

摘要

In test collection based evaluation of retrieval effectiveness, it has been suggested to completely avoid using human relevance judgments. Although several methods have been proposed, their accuracy is still limited. In this paper we present two overall contributions. First, we provide a systematic comparison of all the most widely adopted previous approaches on a large set of 14 TREC collections. We aim at analyzing the methods in a homogeneous and complete way, in terms of the accuracy measures used as well as in terms of the datasets selected, showing that considerably different results may be achieved considering different methods, datasets, and measures. Second, we study the combination of such methods, which, to the best of our knowledge, has not been investigated so far. Our experimental results show that simple combination strategies based on data fusion techniques are usually not effective and even harmful. However, some more sophisticated solutions, based on machine learning, are indeed effective and often outperform all individual methods. Moreover, they are more stable, as they show a smaller variation across datasets. Our results have the practical implication that, when trying to automatically evaluate retrieval effectiveness, researchers should not use a single method, but a (machine-learning based) combination of them.
机译:在基于测试收集的检索效能评估中,已经建议完全避免使用人类的相关判断。虽然已经提出了几种方法,但它们的准确性仍然有限。在本文中,我们提供了两个总体贡献。首先,我们提供了对大量14个TREC集合上所有最广泛采用的先前采用的最广泛采用的方法的系统比较。我们的目的是通过所使用的准确度措施以及所选择的数据集来分析均匀和完整的方式的方法,示出了考虑不同方法,数据集和措施的显着不同的结果。其次,我们研究了这类方法的结合,这迄今为止尚未调查。我们的实验结果表明,基于数据融合技术的简单组合策略通常不有效甚至有害。然而,基于机器学习的一些更复杂的解决方案确实有效并且通常优于所有单独的方法。此外,它们更稳定,因为它们在数据集中显示了较小的变化。我们的结果具有实际意义,即在试图自动评估检索效果时,研究人员不应使用单一方法,而是(基于机器学习的)组合它们。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号