首页> 外文OA文献 >Querying distributed heterogeneous structured and semi-structured data sources
【2h】

Querying distributed heterogeneous structured and semi-structured data sources

机译:查询分布式异构结构化和半结构化数据源

摘要

The continuing growth and widespread popularity of the internet means that the collection of useful data available for public access is rapidly increasing both in number and size. These data are spread over distributed heterogeneous data sources like traditional databases or sources of various forms containing unstructured and semi-structured data. Obviously, the value of these data sources would in many cases be greatly enhanced if the data they contain could be combined and queried in a uniform manner. The research work reported in this dissertation is concerned with querying and integrating a multiplicity of distributed heterogeneous structured data residing in relational databases and semi-structured data held in well- formed XML documents produced by internet applications or human- coded. In particular, we have addressed the problems of: (1) specifying the mappings between a global schema and the local data sources' schemas, and resolving the heterogeneity which can occur between data models, schemas or schema concepts (2) processing queries that are expressed on a global schema into local queries. We have proposed an approach to combine and query the data sources through a mediation layer. Such a layer is intended to establish and evolve an XML Metadata Knowledge Base (XMKB) incrementally which assists the Query Processor in mediating between user queries posed over the global schema and the queries on the underlying distributed heterogeneous data sources. It translates such queries into sub-queries -called local queries- which are appropriate to each local data source. The XMKB is built in a bottom-up fashion by extracting and merging incrementally the metadata of the data sources. It holds the data source's information (names, types and locations), descriptions of the mappings between the global schema and the participating data source schemas, and function names for handling semantic and structural discrepancies between the representations. To demonstrate our research, we have designed and implemented a prototype system called SISSD (System to Integrate Structured and Semi- structured Databases). The system automatically creates a GUI tool for meta-users (who do the metadata integration) which they use to describe mappings between the global schema and local data source schemas. These mappings are used to produce the XMKB. The SISSD allows the translation of user queries into sub-queries fitting each participating data source, by exploiting the mapping information stored in the XMKB. The major results of the thesis are: (1) an approach that facilitates building structured and semi-structured data integration systems (2) a method for generating mappings between a global and local schemas' paths, and resolving the conflicts caused by the heterogeneity of the data sources such as naming, structural, and semantic conflicts which, may occur between the schemas (3) a method for translating queries in terms of a global schema into sub-queries in terms of local schemas. Hence, the presented approach shows that: (a) mapping of the schemas' paths can only be partially automated, since the logical heterogeneity problems need to be resolved by human judgment based on the application requirements (b) querying distributed heterogeneous structured and semi-structured data sources is possible.
机译:互联网的持续发展和广泛普及意味着可供公众访问的有用数据的收集数量和规模都在迅速增加。这些数据分布在分布式异构数据源(如传统数据库)或包含非结构化和半结构化数据的各种形式的源上。显然,如果可以以统一的方式组合和查询其中包含的数据,那么在许多情况下,这些数据源的价值将大大提高。本文的研究工作涉及查询和集成关系数据库中驻留的多个分布式异构结构化数据以及Internet应用程序或人工编码的格式良好的XML文档中保存的半结构化数据。特别是,我们解决了以下问题:(1)指定全局模式与本地数据源的模式之间的映射,并解决数据模型,模式或模式概念之间可能发生的异质性(2)处理查询在全局模式中表示为本地查询。我们提出了一种通过中介层组合和查询数据源的方法。这样的层旨在逐步建立和发展XML元数据知识库(XMKB),它帮助查询处理器在全局模式上提出的用户查询与基础分布式异构数据源上的查询之间进行调解。它将此类查询转换为适合每个本地数据源的子查询(称为本地查询)。通过以增量方式提取和合并数据源的元数据,XMKB以自下而上的方式构建。它包含数据源的信息(名称,类型和位置),全局模式与参与的数据源模式之间的映射的描述以及用于处理表示形式之间语义和结构差异的函数名称。为了证明我们的研究,我们设计并实现了一个称为SISSD(集成结构化和半结构化数据库的系统)的原型系统。系统会自动为元用户(进行元数据集成)创建一个GUI工具,他们使用它们来描述全局模式和本地数据源模式之间的映射。这些映射用于生成XMKB。 SISSD通过利用存储在XMKB中的映射信息,将用户查询转换为适合每个参与数据源的子查询。论文的主要结果是:(1)促进构建结构化和半结构化数据集成系统的方法(2)一种用于在全局和局部模式的路径之间生成映射,并解决由异构性导致的冲突的方法。模式之间可能发生的数据源(如命名,结构和语义冲突)(3)一种用于将根据全局模式进行的查询转换为根据本地模式进行的子查询的方法。因此,所提出的方法表明:(a)模式路径的映射只能部分自动化,因为逻辑异质性问题需要通过基于应用程序需求的人为判断来解决(b)查询分布式异质结构化和半结构化结构化数据源是可能的。

著录项

  • 作者

    Al-Wasil Fahad M;

  • 作者单位
  • 年度 2007
  • 总页数
  • 原文格式 PDF
  • 正文语种 English
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号