首页> 外文期刊>Concurrency and computation: practice and experience >A task scheduling strategy based on weighted round robin for distributed crawler
【24h】

A task scheduling strategy based on weighted round robin for distributed crawler

机译:基于加权轮循的分布式爬虫任务调度策略

获取原文
获取原文并翻译 | 示例

摘要

With the rapid development of the network, stand-alone crawlers are finding hard to find and gather information. Distributed crawlers are gradually accepted to solve this problem. This paper proposes a task scheduling strategy based on weighted round robin for small-scale distributed crawler with formula weights for the current node based on crawling efficiency, implements a distributed crawler system with multithreading support and deduplication which takes the algorithm as core, and discusses some possible extensions and details. The design of the error recovery mechanism and the node table allows crawling nodes have flexible scalability and fault tolerance. Finally, we conducted some experiments to prove the good load balancing performance of the system. Concurrency and Computation: Practice and Experience, 2015.© 2015 Wiley Periodicals, Inc. Copyright © 2015 John Wiley & Sons, Ltd.
机译:随着网络的快速发展,独立的爬虫越来越难以找到和收集信息。逐渐采用分布式爬虫来解决此问题。提出了一种基于加权轮循的小型分布式爬虫任务调度策略,并基于爬取效率对当前节点进行公式加权,实现了以算法为核心的多线程支持和重复数据删除的分布式爬虫系统。可能的扩展名和详细信息。错误恢复机制和节点表的设计使爬网节点具有灵活的可伸缩性和容错能力。最后,我们进行了一些实验,以证明系统具有良好的负载平衡性能。并发和计算:实践和经验,2015年。©2015 Wiley Periodicals,Inc.版权所有©2015 John Wiley&Sons,Ltd.。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号