【24h】

A New Algorithm of Topical Crawler

机译:一种新的局部履带算法

获取原文

摘要

The generic crawler provides more help to people for finding information in WWW. However, it has some drawback in terms of precision and efficiency because of its generality and no specialty. In this paper, we address two issues of the topical web crawler. One is how to make the definition of the topic; the other is how to sort of links to be downloaded in the queue efficiently. It aims to visit only relevant pages, and get a great scale of hyperlinks which link to the relevant pages. The crawl method in this paper is a novel one, which is based on the semi-structured features of the website and content information. The results of experiment show that it is a very effective method for focused crawler.
机译:通用履带为人们提供更多帮助,以查找WWW中的信息。然而,由于其一般性,并且没有专业,它在精度和效率方面存在一些缺点。在本文中,我们解决了局部Web履带的两个问题。一个是如何制定这个主题的定义;另一种是如何有效地在队列中下载的链接。它旨在只访问相关页面,并获得大规模的超链接,链接到相关页面。本文的爬网方法是一种小说,基于网站和内容信息的半结构化特征。实验结果表明它是一种非常有效的聚焦履带方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号