首页> 外国专利> INFORMATION FORECAST AND ACQUISITION METHOD BASED ON WEBPAGE LINK PARAMETER ANALYSIS

INFORMATION FORECAST AND ACQUISITION METHOD BASED ON WEBPAGE LINK PARAMETER ANALYSIS

机译:基于网页链接参数分析的信息预测与获取方法

摘要

The present invention discloses an information prediction and crawling method based on a webpage link parameter analysis, comprising the following sequence of steps: calculating statistical parameter features of webpage links, calculating distribution patterns of outlinks contained in webpages, classifying the webpages according to the distribution patterns of the outlinks of the webpages, performing a sampling prediction on webpage resources, performing an crawling test on prediction samples, and performing an overall prediction on the webpage resources. According to the method of the present invention, the deficiencies of the traditional webpages crawling mode are effectively supplemented, the quantity of link resources to be crawled is expanded, a great number of undiscovered webpage resources are predicted by means of the known webpage resource features, and the speed and coverage rate of the webpage information crawling is improved.
机译:本发明公开了一种基于网页链接参数分析的信息预测和爬取方法,包括以下步骤:计算网页链接的统计参数特征;计算网页中所包含的外链的分布方式;根据分布方式对网页进行分类。网页的外部链接,对网页资源执行抽样预测,对预测样本执行爬网测试以及对网页资源进行整体预测。根据本发明的方法,有效地弥补了传统网页爬行模式的不足,扩大了要爬行的链接资源的数量,并通过已知的网页资源特征预测了大量未发现的网页资源,从而提高了网页信息爬行的速度和覆盖率。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号