【24h】

A Novel Architecture for Deep Web Crawler

机译:面向深层网络爬虫的新颖架构

获取原文
获取原文并翻译 | 示例
           

摘要

A traditional crawler picks up a URL, retrieves the corresponding page and extracts various links, adding them to the queue. A deep Web crawler, after adding links to the queue, checks for forms. If forms are present, it processes them and retrieves the required information. Various techniques have been proposed for crawling deep Web information, but much remains undiscovered. In this paper, the authors analyze and compare impor-tant deep Web information crawling techniques to find their relative limitations and advantages. To minimize limitations of existing deep Web crawlers, a novel architecture is proposed based on QIIIEP specifications (Sharma & Sharma, 2009). The proposed architecture is cost effective and has features of privatized search and general search for deep Web data hidden behind html forms.
机译:传统的搜寻器选择一个URL,检索相应的页面并提取各种链接,然后将它们添加到队列中。深度Web搜寻器在将链接添加到队列后,检查表单。如果存在表单,它将对其进行处理并检索所需的信息。已经提出了各种技术来爬取深层的Web信息,但仍有许多未发现的技术。在本文中,作者分析并比较了重要的深层Web信息爬网技术,以发现它们的相对局限性和优势。为了最大程度地减少现有深层Web爬虫的限制,提出了一种基于QIIIEP规范的新颖体系结构(Sharma&Sharma,2009)。所提出的体系结构具有成本效益,并具有私有搜索和对隐藏在html表单后面的深层Web数据进行常规搜索的功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号