In the theme crawler,the Shark-Search algorithm is insufficient to consider the global web page.In this paper,the PageRank algorithm is used to calculate the URL's authority to make up for this shortcoming,and Shark-PageRank algorithm,which adopts the anchor text,the context near the anchor text and authoritative value of web page to measure the value of the URL,is proposed in this paper.The experiment results show that the algorithm improves the speed of the theme crawler in the unit time,and with the increase of the number of pages the algorithm has good accuracy and stability.%针对Shark-Search算法在主题爬虫中对网页全局性的考虑不足,利用PageRank算法计算待下载URL的权威值来弥补这种不足,提出了Shark-PageRank算法,依据锚文本、锚文本邻近的文本和网页的权威值来权衡URL的价值.实验结果显示,在单位时间里,该算法提高了主题爬虫的速度,并且随着网页数量的增加,该算法具有良好的准确率和稳定性.
展开▼