【24h】

Minimizing the Network Distance in Distributed Web Crawling

机译:最小化分布式Web爬网中的网络距离

获取原文
获取原文并翻译 | 示例

摘要

Distributed crawling has shown that it can overcome important limitations of the centralized crawling paradigm. However, the distributed nature of current distributed crawlers is currently not fully utilized. The optimal benefits of this approach are usually limited to the sites hosting the crawler. In this work we describe IPMicra, a distributed location aware web crawler that utilizes an IP address hierarchy and allows crawling of links in a near optimal location aware manner. The crawler outperforms earlier distributed crawling approaches without a significant overhead.
机译:分布式爬网表明,它可以克服集中式爬网范例的重要限制。但是,当前未充分利用当前分布式搜寻器的分布式性质。这种方法的最佳利益通常仅限于托管搜寻器的站点。在本文中,我们描述IPMicra,这是一种分布式的位置感知Web爬虫,它利用IP地址层次结构并允许以接近最佳的位置感知方式对链接进行爬网。搜寻器的性能优于早期的分布式搜寻方法,而没有太多开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号