A dynamic URL assignment method for parallel web crawler

机译：并行Web搜寻器的动态URL分配方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A web crawler is a relatively simple automated program or script that methodically scans or “crawls” through Internet pages to retrieval information from data. Alternative names for a web crawler include web spider, web robot, bot, crawler, and automatic indexer. There are many different uses for a web crawler. Their primary purpose is to collect data so that when Internet surfers enter a search term on their site, they can quickly provide the surfer with relevant web sites. In this work we propose the model of a low cost web crawler for distributed environments based on an efficient URL assignment algorithm. The function of every module of the crawler is analyzed and main rules that crawlers must follow to maintain load balancing and robustness of system when they are searching on the web simultaneously, are discussed. The proposed a dynamic URL assignment method, based on grid computing technology and dynamic clustering, results efficient increasing web crawler performance.

机译：Web搜寻器是一种相对简单的自动化程序或脚本，可以有条不紊地扫描或“爬网”整个Internet页面以从数据中检索信息。网络搜寻器的替代名称包括网络蜘蛛，网络机器人，漫游器，搜寻器和自动索引器。 Web搜寻器有许多不同的用途。他们的主要目的是收集数据，以便当互联网冲浪者在其站点上输入搜索词时，他们可以快速为冲浪者提供相关的网站。在这项工作中，我们提出了一种基于高效URL分配算法的低成本Web爬虫模型，用于分布式环境。分析了爬虫的每个模块的功能，并讨论了当它们同时在Web上搜索时，爬虫必须遵循的主要规则以维持系统的负载平衡和鲁棒性。提出了一种基于网格计算技术和动态聚类的动态URL分配方法，可有效提高Web搜寻器的性能。

著录项

来源
《2010 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications》|2010年|119-123|共5页
会议地点
作者
Guerriero A.; Ragni F.; Martines C.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP216;
关键词
Cache memories; Distributed computing; Fuzzy clustering; component;

机译：缓存，分布式计算，模糊聚类，组件;

相似文献

外文文献
中文文献
专利

1. A Full-Coverage Two-Level URL Duplication Checking Method for a High-Speed Parallel Web Crawler [J] . Younus Arjumand, Whang Kyu-Young, Kwon Hyuk-Yoon, Journal of information science and engineering . 2015,第3期

机译：高速并行Web爬网程序的全覆盖两级URL重复检查方法
2. A Space-saving URL Duplication Removal Method for Web Crawler [J] . Yingjun Wu, Han Huang, Xianzheng Zhou, Journal of information and computational science . 2012,第5期

机译：用于Web爬网程序的节省空间的URL重复删除方法
3. WebParF:A Web Partitioning Framework for Parallel Crawler [J] . Sonali Gupta, Komal Bhatia International Journal on Computer Science and Engineering . 2013,第8期

机译：WebParF：用于并行爬网程序的Web分区框架
4. A dynamic URL assignment method for parallel web crawler [C] . Guerriero A., Ragni F., Martines C. 2010 IEEE International Conference on Computational Intelligence for Measurement Systems and Applications . 2010

机译：并行Web搜寻器的动态URL分配方法
5. Constructing Web Crawlers for the World Art Dynamics Technology Platform [D] . Guo, Xueyuan. 2019

机译：为世界艺术动力学技术平台构建网络爬虫
6. Hydrodynamical effect of parallelly swimming fish using computational fluid dynamics method [O] . Keisuke Doi, Tsutomu Takagi, Yasushi Mitsunaga, 2021

机译：使用计算流体动力学方法平行游泳的流体动力学效应
7. Dis-Dyn Crawler:A Distributed Crawler for Dynamic Web Page [O] . Jianfu Cai, Hua Zhang 2015

机译：DIS-DYN爬网程序：动态网页的分布式爬网
8. Dynamic Scheduling for Web Monitoring Crawler [R] . Kang, B. H., Compton, P., Motoda, H., 2009

机译：Web监控爬虫的动态调度

A dynamic URL assignment method for parallel web crawler

摘要

著录项

相似文献

相关主题

期刊订阅