首页>
外国专利>
Optimized web domains classification based on progressive crawling with clustering
Optimized web domains classification based on progressive crawling with clustering
展开▼
机译:基于聚类的渐进式爬网的优化Web域分类
展开▼
页面导航
摘要
著录项
相似文献
摘要
Techniques for optimized web domains classification based on progressive crawling with clustering are disclosed. In some embodiments, optimized web domains classification based on progressive crawling with clustering includes crawling a domain (e.g., a web site domain) to collect data for a subset of pages (e.g., web pages) of a corpus of content associated with the domain; classifying each of the crawled pages into one or more category clusters, in which the category clusters represent a content categorization of the corpus of content associated with the domain (e.g., a URL content categorization for the domain, host of that domain, and/or directory of that domain); and determining which of the one or more category clusters to publish for the domain.
展开▼