【24h】

WICCAP: from semi-structured data to structured data

机译:WICCAP:从半结构化数据到结构化数据

获取原文

摘要

Web data extraction is a technique for extracting and integrating data from Web based semistructured data. Wrappers function like the kernel of Web data extraction systems providing information mediator between users and a large number of heterogeneous data sources. Typically, they process semistructured documents generated from structured databases based on rules that are usually hidden to users. Much research has been done to use various methods to represent the knowledge of hidden rules and exploit techniques such as grammar induction, inductive logic programming, etc., to discover these rules that can be used by wrappers to extract data. An important property of semistructured data is its hierarchical structure. Intuitively, we can devise a method that can use this structure information to generate wrappers. We describe a Web data extraction system - WICCAP and its internal Web Data Extraction Language (WDEL) that provides unified view of Web data resources and extracted data. We describe some rule generation features of WICCAP and provide detailed description of the internal language and its implementation. We have conducted experiments to show the ease on generating wrappers with this approach.
机译:Web数据提取是一种从基于Web的半结构化数据中提取和集成数据的技术。包装器的功能类似于Web数据提取系统的内核,可在用户和大量异构数据源之间提供信息中介。通常,他们根据通常对用户隐藏的规则来处理从结构化数据库生成的半结构化文档。已经进行了许多研究,使用各种方法来表示隐藏规则的知识,并利用诸如语法归纳,归纳逻辑编程等技术来发现这些规则,包装程序可以使用这些规则来提取数据。半结构化数据的重要属性是其分层结构。直观地,我们可以设计一种方法,可以使用此结构信息来生成包装器。我们描述了一种Web数据提取系统-WICCAP及其内部的Web数据提取语言(WDEL),它提供Web数据资源和提取数据的统一视图。我们描述了WICCAP的一些规则生成功能,并提供了内部语言及其实现的详细描述。我们进行了实验,以证明使用这种方法可以轻松生成包装器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号