An adaptive web crawling system generates a first utility measurement based on web page snippets associated with individual search result items by crawling from a collection of web page crawling seeds and according to a specific user web crawling criteria. The system generates a second utility measurement based on features extracted from the full webpages downloaded according to the guidance of the first utility measurement results. A web page utility prediction function is introduced to forecast the second utility measurement based on the first utility measurement. The system adapts its priorities for web crawling based on the web page utility prediction function.
展开▼