首页> 外文会议> >Scientific data integration: wrapping textual documents with a database view mechanism and an XML engine
【24h】

Scientific data integration: wrapping textual documents with a database view mechanism and an XML engine

机译:科学的数据集成:使用数据库视图机制和XML引擎包装文本文档

获取原文

摘要

Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or from documents retrieved from the World Wide Web. We present an approach to querying flat files as well as Web data sources through an object database view based on a database system and a wrapper. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, secondly builds the expected output with respect to the virtual structure. Scientific data servers, and in particular the ones publicly available on the Web, usually provide information retrieval techniques to access data. Our wrappers are composed of a retrieval component, based on an intermediate object view mechanism called 'search views' mapping the source capabilities to attributes, and a XML engine to perform these two tasks. If the retrieval component is specific to each data source, this approach shows that the extraction component (the XML engine) can be common. We describe our system and focus on the retrieval component of the Object-Web Wrapper (OWW) for Web sources. The originality of our approach consists of (1) a common wrapper architecture for flat files and Web data sources sharing a XML engine for data extraction, (2) a generic view mechanism to access data sources with limited capabilities, and (3) the representation of hyperlinks as abstract attributes in the object view as well as their use in the search view. Our approach has been developed and demonstrated as part of a multidatabase system supporting queries via uniform Object Protocol Model (OPM) interfaces.
机译:建立用于科学数据的数字图书馆需要访问和处理从平面文件或从万维网检索的文档中提取的数据。我们提出了一种通过基于数据库系统和包装器的对象数据库视图来查询平面文件和Web数据源的方法。通常,包装器有两个任务:首先,它向源发送查询以检索数据;其次,针对虚拟结构构建预期的输出。科学数据服务器,尤其是在Web上公开可用的科学数据服务器,通常会提供信息检索技术来访问数据。我们的包装器由一个检索组件组成,该组件基于一个称为“搜索视图”的中间对象视图机制,该机制将源功能映射到属性,以及一个XML引擎来执行这两项任务。如果检索组件特定于每个数据源,则此方法表明提取组件(XML引擎)可以是通用的。我们描述我们的系统,并着重于Web资源的对象Web包装器(OWW)的检索组件。我们方法的独创性包括(1)用于平面文件和Web数据源的通用包装体系结构,共享用于数据提取的XML引擎;(2)用于访问功能有限的数据源的通用视图机制;以及(3)表示形式超链接作为对象视图中的抽象属性以及它们在搜索视图中的使用。我们的方法已经开发并作为支持通过统一对象协议模型(OPM)接口进行查询的多数据库系统的一部分进行了演示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号