首页> 外文会议>European Conference on Information Retrieval >System Effect Estimation by Sharding: A Comparison Between ANOVA Approaches to Detect Significant Differences
【24h】

System Effect Estimation by Sharding: A Comparison Between ANOVA Approaches to Detect Significant Differences

机译:通过分片进行系统效应估计:ANOVA方法之间的比较检测显着差异

获取原文

摘要

The ultimate goal of the evaluation is to understand when two IR systems are (significantly) different. To this end, many comparison procedures have been developed over time. However, to date, most reproducibility efforts focused just on reproducing systems and algorithms, almost fully neglecting to investigate the reproducibility of the methods we use to compare our systems. In this paper, we focus on methods based on ANalysis Of VAriance (ANOVA), which explicitly model the data in terms of different contributing effects, allowing us to obtain a more accurate estimate of significant differences. In this context, recent studies have shown how sharding the corpus can further improve the estimation of the system effect. We replicate and compare methods based on "traditional" ANOVA (tANOVA) to those based on a bootstrapped version of ANOVA (bANOVA) and those performing multiple comparisons relying on a more conservative Family-wise Error Rate (FWER) controlling approach to those relying on a more lenient False Discovery Rate (FDR) controlling approach. We found that bANOVA shows overall a good degree of reproducibility, with some limitations for what concerns the confidence intervals. Besides, compared to the tANOVA approaches, bANOVA presents greater statistical power, at the cost of lower stability. Overall, with this work, we aim at shifting the focus of reproducibility from systems alone to the methods we use to compare and analyze their performance.
机译:评估的最终目标是了解两种IR系统(显着)不同。为此,随着时间的推移,许多比较程序已经过。然而,迄今为止,大多数重复性努力专注于再现系统和算法,几乎完全忽略了研究我们使用的方法的再现性来比较我们的系统。在本文中,我们专注于基于差异分析(ANOVA)的方法,该方法在不同贡献效果方面明确地模拟数据,允许我们获得更准确的估计显着差异。在这种情况下,最近的研究表明了语料库如何进一步改善系统效果的估计。我们将基于“传统”ANOVA(TANOVA)的基于ANOVA(BANOVA)和执行依赖于更保守的家庭明智的错误率(FWER)控制方法的人来复制和比较的方法一种更宽敞的虚假发现率(FDR)控制方法。我们发现Banova表现出良好程度的重现性,有些局限性涉及置信区间。此外,与Tanova方法相比,Banova以较低的稳定性提高了更大的统计力量。总的来说,通过这项工作,我们的目标是将单独的系统的重复性转移到我们用于比较和分析其性能的方法中的重复性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号