【24h】

Distributed High-Performance Web Crawler Based on Peer-to-Peer Network

机译:基于对等网络的分布式高性能Web爬虫

获取原文
获取原文并翻译 | 示例

摘要

Distributing the crawling activity among multiple machines can distribute processing to reduce the analysis of web page. This paper presents the design of a distributed web crawler based on Peer-to-Peer network. The distributed crawler harnesses the excess bandwidth and computing resources of nodes in system to crawl the web. Each crawler is deployed in a computing node of P2P to analyze web page and generate indices. Control node is another node to being in charge of distributing URLs to balance the load of the crawler. Control nodes are organized as P2P network. The crawler nodes managed by the same control node is a group. According to the ID of crawler and average load of the group, crawler can decide whether transmits the URL to control node or hold itself. We present an implementation of the distributed crawler based on Igloo and simulate the environment to evaluate the balancing load on the crawlers and crawl speed.
机译:在多台计算机之间分配爬网活动可以分配处理以减少对网页的分析。本文提出了一种基于对等网络的分布式网络爬虫的设计。分布式搜寻器利用系统中节点的多余带宽和计算资源来搜寻Web。每个搜寻器都部署在P2P的计算节点中,以分析网页并生成索引。控制节点是另一个负责分发URL来平衡搜寻器负载的节点。控制节点被组织为P2P网络。由同一控制节点管理的搜寻器节点是一个组。根据爬虫的ID和组的平均负载,爬虫可以决定是将URL传输到控制节点还是保留自身。我们提出了一种基于Igloo的分布式爬虫的实现,并模拟了环境以评估爬虫的平衡负载和爬网速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号