首页> 外文期刊>Information Processing & Management >Engineering a multi-purpose test collection for Web retrieval experiments
【24h】

Engineering a multi-purpose test collection for Web retrieval experiments

机译:设计用于Web检索实验的多功能测试集合

获取原文
获取原文并翻译 | 示例
       

摘要

Past research into text retrieval methods for the Web has been restricted by the lack of a test collection capable of supporting experiments which are both realistic and reproducible. The 1.69 million document WT10g collection is proposed as a multi-purpose testbed for experiments with these attributes, in distributed IR, hyperlink algorithms and conventional ad hoc retrieval. WT10g was constructed by selecting from a superset of documents in such a way that desirable corpus properties were preserved or optimised. These properties include: a high degree of inter-server connectivity, integrity of server holdings, inclusion of documents related to a very wide spread of likely queries, and a realistic distribution of server holding sizes. We confirm that WTlOg contains exploitable link information using a site (homepage) finding experiment. Our results show that, on this task, Okapi BM25 works better on propagated link anchor text than on full text. WT10g was used in TREC-9 and TREC-2000 and both topic relevance and homepage finding queries and judgments are available.
机译:过去对Web文本检索方法的研究由于缺少能够支持真实且可重复的实验的测试集而受到限制。 169万份WT10g文档集被提议作为多功能试验台,用于在分布式IR,超链接算法和常规临时检索中利用这些属性进行实验。 WT10g是通过从文档的超集中进行选择而构建的,从而可以保留或优化所需的语料库属性。这些属性包括:高度的服务器间连接性,服务器保存的完整性,包含与可能查询的广泛分布有关的文档以及服务器保存大小的实际分布。我们使用站点(主页)查找实验确认WT10g包含可利用的链接信息。我们的结果表明,在此任务上,Okapi BM25在传播的链接锚文本上比在全文上效果更好。 WT10g用于TREC-9和TREC-2000,并且主题相关性和主页查找查询和判断均可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号