...
【24h】

TREC: An Overview

机译:TREC:概述

获取原文
获取原文并翻译 | 示例
           

摘要

The pervasiveness of digital information and the ease with which it can be found via sophisticated search engines make it hard to believe that this was not always the case. But back in 1992, when the Text REtrieval Conference (TREC) began, little digital information was publicly available, and it could only be found by knowing its exact location and using ftp to download the information. Commercial services such as Chemical Abstracts, BRS, and Dialog were available, but these operated on large-scale, custom-built databases of information culled from scientific journals and other such sources. These services were not available to the general public, except via subscribing libraries; furthermore, they could only be accessed via complex (Boolean) search mechanisms requiring a skilled user. Today's simple searching techniques, which return a ranked list of results, were developed in information retrieval research laboratories many years ago, but there was no way to test these on large-scale data; the only data available were collections of abstracts and/or titles on the order of two megabytes of text. In 1990 the National Institute of Standards and Technology (NIST) was asked to build a new, very large test collection for use in evaluating the text retrieval technology being developed as part of the U.S. Department of Defense, Advanced Research Projects Agency (DARPA) TIPSTER project (for more on the TIPSTER project, see Merchant, 1994). This collection was to be on the order of one million full-text documents, about 100 times larger than existing non-proprietary test collections. The following year, NIST proposed that this large test collection be made available to the full information retrieval community by the formation of TREC.
机译:数字信息的普遍性以及通过复杂的搜索引擎可以轻松找到数字信息的特性,使得人们很难相信并非总是如此。但是回到1992年,当文本检索会议(TREC)开始时,很少公开提供数字信息,只有通过知道其确切位置并使用ftp下载信息才能找到它。可以使用诸如化学文摘,BRS和Dialog之类的商业服务,但是这些服务在大规模,定制的信息数据库中运行,这些数据库从科学期刊和其他此类来源中搜集而来。除非通过订阅图书馆,否则这些服务对公众不可用;此外,只能通过需要熟练用户的复杂(布尔)搜索机制来访问它们。多年前,信息检索研究实验室开发了当今的简单搜索技术,这些技术返回了结果的排名列表,但是没有办法在大规模数据上进行测试;唯一可用的数据是摘要和/或标题的集合,其文本大小为2 MB。 1990年,美国国家标准技术研究院(NIST)被要求建立一个新的超大型测试集,用于评估文本检索技术,该技术是美国国防部高级研究计划局(DARPA)TIPSTER的一部分项目(有关TIPSTER项目的更多信息,请参阅Merchant,1994年)。该馆藏的全文文档数量约为一百万,比现有的非专有测试馆藏大100倍。次年,NIST提议通过组建TREC,将这个庞大的测试集提供给整个信息检索社区。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号