Architecture for deep mining of network resource references such as URLs. The architecture includes an extraction component configured to extract useful entity information from a collection of entity information, the collection of entity information derived from local search data; a distributed processing component configured to distributively query a search engine using the useful entity information and receive search results from the search engine, the search results comprising resource references; and, a selection component configured to remove non-relevant resource references to obtain candidate resource references and select a top resource reference from the candidate resource references, using an unsupervised machine learning algorithm.
展开▼