首页> 外文期刊>ACM SIGIR FORUM >Statistical Significance Testing in Information Retrieval: Theory and Practice

Statistical Significance Testing in Information Retrieval: Theory and Practice


获取原文并翻译 | 示例


Œe past 20 years have seen a great improvement in the rigor ofrninformation retrieval experimentation, due primarily to two factors:rnhigh-quality, public, portable test collections such as thosernproduced by TREC (the Text REtrieval Conference [38]), and thernincreased practice of statistical hypothesis testing to determinernwhether measured improvements can be ascribed to somethingrnother than random chance. Together these create a very usefulrnstandard for reviewers, program commiŠees, and journal editors;rnwork in information retrieval (IR) increasingly cannot be publishedrnunless it has been evaluated using a well-constructed test collectionrnand shown to produce a statistically signi€cant improvement overrna good baseline.rnBut, as the saying goes, any tool sharp enough to be useful isrnalso sharp enough to be dangerous. Statistical tests of signi€cancernare widely misunderstood. Most researchers and developers treatrnthem as a “black box”: evaluation results go in and a p-value comesrnout. But because signi€cance is such an important factor in determiningrnwhat research directions to explore and what is published,rnusing p-values obtained without thought can have consequencesrnfor everyone doing research in IR. Ioannidis has argued that thernmain consequence in the biomedical sciences is that most publishedrnresearch €ndings are false [20]; could that be the case in IR as well?



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号