首页> 外国专利> Optimized web domains classification based on progressive crawling with clustering

Optimized web domains classification based on progressive crawling with clustering

机译:基于聚类的渐进式爬网的优化Web域分类

摘要

Techniques for optimized web domains classification based on progressive crawling with clustering are disclosed. In some embodiments, optimized web domains classification based on progressive crawling with clustering includes crawling a domain (e.g., a web site domain) to collect data for a subset of pages (e.g., web pages) of a corpus of content associated with the domain; classifying each of the crawled pages into one or more category clusters, in which the category clusters represent a content categorization of the corpus of content associated with the domain (e.g., a URL content categorization for the domain, host of that domain, and/or directory of that domain); and determining which of the one or more category clusters to publish for the domain.
机译:公开了用于基于具有聚类的渐进式爬网的优化的网络域分类的技术。在一些实施例中,基于具有群集的渐进式爬网的优化的网络域分类包括对域(例如,网站域)进行爬网以收集与该域相关联的内容语料库的页面的子集(例如,网页)的数据;将每个已爬网页面分类为一个或多个类别集群,其中类别集群表示与该域相关联的内容主体的内容分类(例如,该域的URL内容分类,该域的主机和/或该域的目录);并确定要为该域发布一个或多个类别集群中的哪一个。

著录项

  • 公开/公告号US9443019B2

    专利类型

  • 公开/公告日2016-09-13

    原文格式PDF

  • 申请/专利权人 PALO ALTO NETWORKS INC.;

    申请/专利号US201514601008

  • 发明设计人 RENARS GAILIS;LIN XU;RENZO LAZZARATO;

    申请日2015-01-20

  • 分类号G06F17/00;G06F17/30;

  • 国家 US

  • 入库时间 2022-08-21 14:32:52

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号