Design of a Parallel and Scalable Crawler for the Hidden Web

Sonali Gupta; Komal Kumar Bhatia

首页> 外文期刊>International journal of information retrieval research >Design of a Parallel and Scalable Crawler for the Hidden Web

【24h】

Design of a Parallel and Scalable Crawler for the Hidden Web

机译：Design of a Parallel and Scalable Crawler for the Hidden Web

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相关主题

摘要

The WWW contains a huge amount of information from different areas. This information may be present virtually in the form of web pages, media, articles (research journals/magazine), blogs, etc. A major portion of the information is present in web databases that can be retrieved by raising queries at the interface offered by the specific database and is thus called the hidden web. An important issue is to efficiently retrieve and provide access to this enormous amount of information through crawling. In this paper, the authors present the architecture of a parallel crawler for the hidden web that avoids download overlaps by following a domain-specific approach. The experimental results further show that the proposed parallel hidden web crawler (PSHWC) effectively and efficiently extracts and downloads the contents in the hidden web databases.

著录项

来源
《International journal of information retrieval research》 |2022年第1期|193-215|共23页
作者
Sonali Gupta; Komal Kumar Bhatia;
展开▼
作者单位

J. C. Bose University of Science and Technology, India;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
Domain-Specific; Focused Crawler; Form Processing; Hidden Web Crawler; Parallel Crawler; Scalable Crawler; Search Engine; Topic-Specific; URL Distribution; Web Page Classification;
入库时间 2024-01-25 19:15:53

Design of a Parallel and Scalable Crawler for the Hidden Web

摘要

著录项

相关主题

期刊订阅