【24h】

On Per-topic Variance in IR Evaluation

机译:投资者关系评估中的主题差异

获取原文

摘要

We explore the notion, put forward by Cormack & Lynam and Robertson, that we should consider a. document collection used for Cranfield-style experiments as a sample from some larger population of documents. In this view, any per-topic metric (such as average precision) should be regarded as an estimate of that metric's true value for that topic in the full population, and therefore as carrying its own per-topic variance or estimate precision or noise. As in the two mentioned papers, we explore this notion by simulating other samples from the same large population. We investigate different ways of performing this simulation. One use of this analysis is to refine the notion of statistical significance of a difference between two systems (in most such analyses, each per-topic measurement is treated as equally precise). We propose a mixed-effects model method to measure significance, and compare it experimentally with the traditional t-fest.
机译:我们探讨了Cormack&Lynam和Robertson提出的概念,我们应该考虑a。克兰菲尔德(Cranfield)风格的实验所使用的文档集合,是来自大量文档的样本。按照这种观点,任何按主题进行的度量(例如平均精度)都应视为该主题在整个人群中该主题的真实值的估计,因此应视为带有其按主题的自身变化或估计精度或噪声。就像在上面提到的两篇论文中一样,我们通过模拟来自相同人口的其他样本来探索这一概念。我们研究了执行此模拟的不同方法。该分析的一种用途是完善两个系统之间差异的统计显着性的概念(在大多数此类分析中,每个按主题进行的度量均被视为同等精确)。我们提出了一种混合效果模型方法来测量重要性,并将其与传统的t-fest进行实验比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号