The pervasiveness of digital information and the ease with which it can be found via sophisticated search engines make it hard to believe that this was not always the case. But back in 1992, when the Text REtrieval Conference (TREC) began, little digital information was publicly available, and it could only be found by knowing its exact location and using ftp to download the information. Commercial services such as Chemical Abstracts, BRS, and Dialog were available, but these operated on large-scale, custom-built databases of information culled from scientific journals and other such sources. These services were not available to the general public, except via subscribing libraries; furthermore, they could only be accessed via complex (Boolean) search mechanisms requiring a skilled user. Today's simple searching techniques, which return a ranked list of results, were developed in information retrieval research laboratories many years ago, but there was no way to test these on large-scale data; the only data available were collections of abstracts and/or titles on the order of two megabytes of text. In 1990 the National Institute of Standards and Technology (NIST) was asked to build a new, very large test collection for use in evaluating the text retrieval technology being developed as part of the U.S. Department of Defense, Advanced Research Projects Agency (DARPA) TIPSTER project (for more on the TIPSTER project, see Merchant, 1994). This collection was to be on the order of one million full-text documents, about 100 times larger than existing non-proprietary test collections. The following year, NIST proposed that this large test collection be made available to the full information retrieval community by the formation of TREC.
展开▼