The World Wide Web can be viewed as a huge digital library. A search engine is a typical means to tap into this vast source of information. There are several situations in which it is of interest to discover the formula used by a search engine to determine the similarity of a document to a query. One such situation is as follows. Currently, there are numerous search engines available on the Internet and the number is likely to increase many times in the future. Finding the most desired do cuments can be a formidable task as these documents may be located in different search engines. To facilitate the retrieval of documents, a solution is to construct a global search engine, also known as metabroker or metasearch engine, on top of the (local) search engines [3, 7, 9]. One of the challenging problems of building an efficient and effective global search engine is the document selection problem (discussed below). Our solution to this problem requires that the global search engine know how a local search engine a ssigns similarities to do cuments with respect to a given query. When such information is not available, the global search engine may need to employ some means to discover how a local search engine determines document similarities.
展开▼