Focused crawling is a relatively new, promising approach to improving the recall of expert search on the Web. It typically starts from a user- or community-specific tree of topics along with a few training documents for each tree node, and then crawls the Web with focus on these topics of interest. This process can efficiently build a theme-specific, hierarchical directory whose nodes are populated with relevant high-quality documents for expert Web search. The BINGO! focused crawler implements an approach that aims to overcome the limitations of the initial training data. To this end, BINGO! identifies, among the crawled and positively classified documents of a topic, characteristic archetypes (good authorities as determined by Kleinberg's HITS algorithm, and documents classified with high confidence using a linear SVM) and uses them for periodically retraining the classifier; this way the crawler is dynamically adapted based on the most significant documents seen so far. While a large amount of information can be collected from the "Surface Web" with traditional crawling as done by today's popular search engines, the major part of high quality, topic-specific data is stored in searchable databases that only produce results dynamically in response to a direct request (i.e., the "Hidden Web" or "Deep Web"). Automated meta portal generation for these hidden sources comes with all the traditional problems a meta search engine has to face. The demonstration shows our approach towards fully automated portal generation that merely starts with a small set of user-specific training documents and dynamically builds up a unified database of Surface Web data as well as of indexed Deep Web pages derived from on-the-fly generated Web Service interfaces for form pages leveraging Semantic-Web-style ontologies. The prototype platform has been used for generating two applications that illustrate the effectiveness and versatility of our approach: the Handicrafts Information Portal (HIP) built for the Saarland's Chamber of Trades and Small Businesses, and a movie metaportal coined MIPS. In the following sections we give a short overview of the BINGO! prototype system and then outline the above mentioned application demos.
展开▼