首页> 外文期刊>International journal of software engineering and knowledge engineering >Retrieving Deep Web Data Through Multi-Attributes Interfaces With Structured Queries
【24h】

Retrieving Deep Web Data Through Multi-Attributes Interfaces With Structured Queries

机译:通过具有结构化查询的多属性接口检索深层Web数据

获取原文
获取原文并翻译 | 示例
           

摘要

A great deal of data on the Web lies in the hidden databases, or the deep Web. Most of the deep Web data is not directly available and can only be accessed through the query interfaces. Current research on deep Web search has focused on crawling the deep Web data via Web interfaces with keywords queries. However, these keywords-based methods have inherent limitations because of the multi-attributes and top-fc features of the deep Web. In this paper we propose a novel approach for siphoning structured data with structured queries. Firstly, in order to retrieve all the data non-repeatedly in hidden databases, we model the hidden database as a hierarchy tree. Under this theoretical framework, data retrieving is transformed into the traversing problem in a tree. We also propose techniques to narrow the query space by using heuristic rule, based on mutual information, to guide the traversal process. We conduct extensive experiments over real deep Web sites and controlled databases to illustrate the coverage and efficiency of our techniques.
机译:Web上的大量数据都位于隐藏的数据库或深层Web中。大多数深层Web数据不是直接可用的,只能通过查询界面进行访问。当前对深度Web搜索的研究集中在通过带有关键字查询的Web界面对深度Web数据进行爬网。但是,由于深层Web的多属性和top-fc功能,这些基于关键字的方法具有固有的局限性。在本文中,我们提出了一种通过结构化查询虹吸结构化数据的新颖方法。首先,为了非重复地检索隐藏数据库中的所有数据,我们将隐藏数据库建模为层次树。在这种理论框架下,数据检索被转换为树中的遍历问题。我们还提出了基于互信息,通过启发式规则来缩小查询空间的技术,以指导遍历过程。我们在真实的深层网站和受控数据库上进行了广泛的实验,以说明我们技术的覆盖范围和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号