首页> 外文会议>International conference on conceptual modeling >A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web
【24h】

A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web

机译:一个统一的包装,调解和重构网络信息的框架

获取原文

摘要

The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources via a common data model and query language. A main problem with current approahces is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. In contrast, our approach integrates wrapping and mediation in a unified framework based on an object-oriented data model which represents both the Web structure and the data of the application domain. Wrappers and mediators are written in a rule-based object-oriented language which is augmented with written in a rule-based object-oriented language which is augmented with features for Web access and structured document analysis, i.e., pattern matching by regular expressions and SGML parsing. In this paper, we develop generic, reusable rule patterns for typical extraction, integration, and restructuring tasks using this framework. We show the practicability of our approach by using the Florid system [10].
机译:来自Web的信息提取的目标是通过公共数据模型和查询语言在异构信息源上提供集成视图。目前申请的主要问题是,他们依靠包装纸和调解器的非常不同的形式主义和工具,从而导致包装板和调解员之间的“阻抗不匹配”。相比之下,我们的方法基于面向对象的数据模型对统一框架中的包装和调解集成在统一的框架中,该数据模型表示Web结构和应用程序域的数据。包装器和调解器以基于规则的面向对象的语言编写,这些语言以基于规则的面向对象的语言编写的语言,这些语言是通过Web访问和结构化文档分析的特征,即通过正则表达式和SGML进行模式匹配解析。在本文中,我们开发了使用此框架的典型提取,集成和重组任务的通用,可重用的规则模式。我们通过使用佛罗里达系统来展示我们的方法的实用性[10]。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号