首页> 外文会议>International conference on intelligent systems and knowledge engineering >Gray Tunneling Based on Block Relevance for Focused Crawling
【24h】

Gray Tunneling Based on Block Relevance for Focused Crawling

机译:基于集中爬行的块相关性的灰色隧道

获取原文

摘要

In this paper the Gray Tunneling is defined and content block algorithm is used to enhance focused crawler's ability of traversing Gray Tunneling. Gray Tunneling resolves the problem that the topic-multiplicity of a web page makes the relevance of the highly relevant page to be weakened. So during the process of crawling, in order to avoid the effect caused by the web page that is irrelevant to the specific topic as a whole but relevant partially, we divide a multi-topical page into several blocks and process the blocks individually, and then we can traverse the page that is irrelevant as a whole to expand the scope crawler reached and get more relevant pages. A comprehensive experiment has been conducted, the result shows obviously that this approach outperforms Best-First and Breadth-First algorithm both in harvest rate and efficiency.
机译:在本文中,定义了灰色隧道,而内容块算法用于增强聚焦爬虫的灰色隧道能力。灰色隧道解决了网页的主题多重性与高相关页面的相关性进行削弱的问题。所以在爬行过程中,为了避免由网页引起的效果与整个特定主题无关但是相关的,我们将多主题页分为多个块并单独处理块,然后单独处理块我们可以遍历与整体无关的页面,以扩展范围爬网程序达到并获得更多相关页面。进行了一个综合实验,结果显然表明,这种方法在收获率和效率中表现出最佳的第一和宽度第一算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号