首页> 中文期刊> 《微型电脑应用》 >一种改进Shark-Search的主题爬虫算法

一种改进Shark-Search的主题爬虫算法

         

摘要

In the theme crawler,the Shark-Search algorithm is insufficient to consider the global web page.In this paper,the PageRank algorithm is used to calculate the URL's authority to make up for this shortcoming,and Shark-PageRank algorithm,which adopts the anchor text,the context near the anchor text and authoritative value of web page to measure the value of the URL,is proposed in this paper.The experiment results show that the algorithm improves the speed of the theme crawler in the unit time,and with the increase of the number of pages the algorithm has good accuracy and stability.%针对Shark-Search算法在主题爬虫中对网页全局性的考虑不足,利用PageRank算法计算待下载URL的权威值来弥补这种不足,提出了Shark-PageRank算法,依据锚文本、锚文本邻近的文本和网页的权威值来权衡URL的价值.实验结果显示,在单位时间里,该算法提高了主题爬虫的速度,并且随着网页数量的增加,该算法具有良好的准确率和稳定性.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号