【24h】

A New Method for Focused Crawler Cross Tunnel

机译:一种集中式履带交叉隧道的新方法

获取原文
获取原文并翻译 | 示例

摘要

Focused crawlers are programs designed to selectively retrieve Web pages relevant to a specific domain for the use of domain-specific search engines. Tunneling is a heuristic-based method that solves global optimization problem. In this paper we use content block algorithm to enhance focused crawler's ability of traversing tunnel. The novel Algorithm not only avoid granularity too coarse when evaluation on the whole page but also avoid granularity too fine based on link-context. A comprehensive experiment has been conducted, the result shows obviously that this approach outperforms BestFirst and Anchor text algorithm both in harvest ratio and efficiency.
机译:重点搜寻器是旨在使用域特定搜索引擎有选择地检索与特定域相关的网页的程序。隧道是一种基于启发式的方法,可以解决全局优化问题。在本文中,我们使用内容块算法来增强集中的爬虫穿越隧道的能力。新颖的算法不仅避免了对整个页面进行评估时粒度太粗,而且还避免了基于链接上下文的粒度太细。进行了全面的实验,结果表明该方法在收割率和效率上均优于BestFirst和Anchor文本算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号