首页> 外国专利> WEB CRAWLER FOR ACQUIRING CONTENT

WEB CRAWLER FOR ACQUIRING CONTENT

机译:用于内容获取的网络爬虫

摘要

An adaptive web crawling system generates a first utility measurement based on web page snippets associated with individual search result items by crawling from a collection of web page crawling seeds and according to a specific user web crawling criteria. The system generates a second utility measurement based on features extracted from the full webpages downloaded according to the guidance of the first utility measurement results. A web page utility prediction function is introduced to forecast the second utility measurement based on the first utility measurement. The system adapts its priorities for web crawling based on the web page utility prediction function.
机译:自适应网络爬行系统通过与网页爬行种子的集合并根据特定的用户网络爬行标准进行爬行,从而基于与各个搜索结果项相关联的网页摘要生成第一效用度量。该系统基于根据第一效用测量结果的指导从下载的完整网页中提取的特征来生成第二效用测量。引入了网页效用预测功能以基于第一效用度量来预测第二效用度量。系统基于网页效用预测功能调整其优先级以进行网络爬网。

著录项

  • 公开/公告号US2016055243A1

    专利类型

  • 公开/公告日2016-02-25

    原文格式PDF

  • 申请/专利权人 UT BATTELLE LLC;

    申请/专利号US201514832393

  • 发明设计人 HONG JUN;SONGHUA XU;

    申请日2015-08-21

  • 分类号G06F17/30;

  • 国家 US

  • 入库时间 2022-08-21 14:35:13

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号