【24h】

A New Method for Focused Crawler Cross Tunnel

机译:一种重点履带式交叉隧道的新方法

获取原文

摘要

Focused crawlers are programs designed to selectively retrieve Web pages relevant to a specific domain for the use of domain-specific search engines. Tunneling is a heuristic-based method that solves global optimization problem. In this paper we use content block algorithm to enhance focused crawler’s ability of traversing tunnel. The novel Algorithm not only avoid granularity too coarse when evaluation on the whole page but also avoid granularity too fine based on link-context. A comprehensive experiment has been conducted, the result shows obviously that this approach outperforms BestFirst and Anchor text algorithm both in harvest ratio and efficiency.
机译:焦点爬虫是旨在选择性地检索与特定域相关的网页以使用域的搜索引擎的程序。隧道是一种基于启发式的方法,解决了全局优化问题。在本文中,我们使用内容块算法来增强聚焦爬虫的遍历隧道能力。新颖算法不仅避免在整个页面评估时避免粒度太粗糙,还避免了基于链路上下文的粒度太短。已经进行了全面的实验,结果显然表明,这种方法在收获比率和效率中表现出最佳的最佳和锚文本算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号