Two stages in measurement of techniques for informationretrieval are gathering of documents for relevance assessment anduse of the assessments to numerically evaluate effectiveness. Weconsider both of these stages in the context of the TRECexperiments, to determine whether they lead to measurements thatare trustworthy and fair. Our detailed empirical investigation ofthe TREC results shows that the measured relative performance ofsystems appears to be reliable, but that recall is overestimated:it is likely that many relevant documents have not been found. Wepropose a new pooling strategy that can significantly in- creasethe number of relevant documents found for given effort, withoutcompromising fairness.
信息检索技术的度量的两个阶段是收集文档以进行相关性评估,并使用评估以数字方式评估有效性。我们在TREC实验的背景下考虑这两个阶段,以确定它们是否导致可信赖且公平的测量。我们对TREC结果的详细实证研究表明,测得的系统相对性能似乎是可靠的,但是召回率被高估了:很可能没有找到许多相关的文档。我们提出了一种新的合并策略,该策略可以在不影响公平性的前提下,显着增加在给定工作量下发现的相关文档的数量。 P>
机译:大型存储/检索请求用于多I / O仓库自动存储/检索系统的分类算法
机译:使用神经网络方法(神经网络臭氧检索系统(NNORSY))从全球臭氧监测实验(GOME)数据中检索臭氧资料-艺术。没有。 4497
机译:用相关辐射计反演对流层CO廓线:I。晴朗大气反演实验
机译:大规模基于簇的土耳其语文本检索实验
机译:在信息检索中建立可靠的测试和培训集合。
机译:长笛:从生物医学文献中的快速可靠知识检索
机译:大规模信息检索实验的结果有多可靠?