A Novel Architecture for Deep Web Crawler

Dilip Kumar Sharma; Shobhit University; Meerut India

首页> 外文期刊>International journal of information technology and web engineering >A Novel Architecture for Deep Web Crawler

【24h】

A Novel Architecture for Deep Web Crawler

机译：面向深层网络爬虫的新颖架构

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A traditional crawler picks up a URL, retrieves the corresponding page and extracts various links, adding them to the queue. A deep Web crawler, after adding links to the queue, checks for forms. If forms are present, it processes them and retrieves the required information. Various techniques have been proposed for crawling deep Web information, but much remains undiscovered. In this paper, the authors analyze and compare impor-tant deep Web information crawling techniques to find their relative limitations and advantages. To minimize limitations of existing deep Web crawlers, a novel architecture is proposed based on QIIIEP specifications (Sharma & Sharma, 2009). The proposed architecture is cost effective and has features of privatized search and general search for deep Web data hidden behind html forms.

机译：传统的搜寻器选择一个URL，检索相应的页面并提取各种链接，然后将它们添加到队列中。深度Web搜寻器在将链接添加到队列后，检查表单。如果存在表单，它将对其进行处理并检索所需的信息。已经提出了各种技术来爬取深层的Web信息，但仍有许多未发现的技术。在本文中，作者分析并比较了重要的深层Web信息爬网技术，以发现它们的相对局限性和优势。为了最大程度地减少现有深层Web爬虫的限制，提出了一种基于QIIIEP规范的新颖体系结构（Sharma＆Sharma，2009）。所提出的体系结构具有成本效益，并具有私有搜索和对隐藏在html表单后面的深层Web数据进行常规搜索的功能。

著录项

来源
《International journal of information technology and web engineering》 |2011年第1期|p.25-48|共24页
作者
Dilip Kumar Sharma; Shobhit University; Meerut India;
展开▼
作者单位

A. K. Sharma, YMCA University of Science and Technology, Faridabad, India;

A. K. Sharma, YMCA University of Science and Technology, Faridabad, India;

A. K. Sharma, YMCA University of Science and Technology, Faridabad, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
authenticate crawling; deep web; hidden web; invisible web; qiiiep; web crawlers;

机译：验证爬行;深网;隐藏的网看不见的网qiiiep;网络爬虫;

相似文献

外文文献
中文文献
专利

1. Architecture specification of rule-based deep web crawler with indexer [J] . S.G. Shaila, A. Vadivel International journal of knowledge and web intelligence . 2013,第2a3期

机译：具有索引器的基于规则的深层网络爬虫的体系结构规范
2. Elastic Web Crawler Service-Oriented Architecture Over Cloud Computing [J] . ElAraby M. E., Moftah Hossam M., Abuelenin Sherihan M., Arabian Journal for Science and Engineering . 2018,第12期

机译：云计算上面向弹性Web爬网程序的面向服务的体系结构
3. Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler [J] . P. Jaganathan, T. Karthikeyan Journal of computer sciences . 2015,第1期

机译：高效的架构，可使用增量并行Web爬网程序进行可扩展的集中爬网
4. A New Architecture of an Intelligent Agent-Based Crawler for Domain-Specific Deep Web Databases [C] . Li Yanni, Wang Yuping, Tian Erfeng IEEE/WIC/ACM International Conference on Web Intelligence;WI 2012;IAT 2012;IEEE/WIC/ACM International Conference on Intelligent Agent Technology;ODMWI 2012;International Workshop on Optimization-based Data Mining and Web Intelligence;BI 2012;International Workshop on Behavior Informatics;IWI-2012;TF'12;International Workshop on Intelligent Web Interaction;NLPOE 2012;International Workshop on Tourism Facilities;NiCaM-WI 2012;WPRSM 2012;International Workshop on Natural Language Processing and Ontology Engineering;WIRSS;International Workshop on Nature-Inspired Computing and Metaheuristics for Web Intelligence;WISS 2012;International Workshop on Web Personalization, Recommender Systems and Social Media;International Workshop on Web Information Retrieval Support Systems;International Symposium on Web Intelligent Systems Services;Combined Workshop on Cross-Cultural and Cross-Linguistic Semantic Web and Software Agent Teamwork for the Semantic Web;International Workshop on Social Networks and Data Processing;SNDP 2012;International Symposium on the Intelligent Campus;IC'12;International Workshop on Green Computing and Sustainable Society;GCSS . 2012

机译：用于特定于域的深层Web数据库的基于智能代理的爬网程序的新体系结构
5. Constructing Web Crawlers for the World Art Dynamics Technology Platform [D] . Guo, Xueyuan. 2019

机译：为世界艺术动力学技术平台构建网络爬虫
6. A user-oriented web crawler for selectively acquiring online content in e-health research [O] . Songhua Xu, Hong-Jun Yoon, Georgia Tourassi -1

机译：面向用户的网络爬虫用于在电子卫生研究中选择性地获取在线内容
7. SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERFACES [O] . 2017

机译：智能履带：用于有效收获深网络界面的两级履带器

A Novel Architecture for Deep Web Crawler

摘要

著录项

相似文献

相关主题

期刊订阅