【24h】

Data-Parallel Web Crawling Models

机译:数据并行Web爬行模型

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The need to quickly locate, gather, and store the vast amount of material in the Web necessitates parallel computing. In this paper, we propose two models, based on multi-constraint graph-partitioning, for efficient data-parallel Web crawling. The models aim to balance the amount of data downloaded and stored by each processor as well as balancing the number of page requests. made by the processors. The models also minimize the total volume of communication during the link exchange between the processors. To evaluate the performance of the models, experimental results are presented on a sample Web repository containing around 915,000 pages.
机译:为了快速定位,收集和存储Web中的大量资料,需要并行计算。在本文中,我们提出了两个基于多约束图分区的模型,用于有效的数据并行Web爬网。这些模型旨在平衡每个处理器下载和存储的数据量,以及平衡页面请求的数量。由处理器制造。这些模型还将处理器之间的链路交换期间的通信总量最小化。为了评估模型的性能,在包含大约915,000页的示例Web存储库中展示了实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号