首页> 外国专利> URL AND ANCHOR TEXT ANALYSIS FOR FOCUSED CRAWLING

URL AND ANCHOR TEXT ANALYSIS FOR FOCUSED CRAWLING

机译:重点抓取的URL和锚文本分析

摘要

Systems and methods of URL and anchor text analysis for focused crawling are disclosed In an exemplary embodiment, a method may include training a focused crawler by obtaining a training set of at least URL's or anchor text for a website, computing a score for the training set, and extracting a plurality of features of the training set, and computing a score for each of the plurality of features The features identify key information contained in the website The method may also include executing a trained focused crawler on other websites.
机译:公开了用于集中爬行的URL和锚文本分析的系统和方法。在示例性实施例中,一种方法可以包括通过获得至少网站的URL或锚文本的训练集,计算训练集的分数来训练集中爬行器。以及提取训练集的多个特征,并为多个特征中的每个特征计算得分。特征识别包含在网站中的关键信息。该方法还可以包括在其他网站上执行经过训练的聚焦爬虫。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号