首页> 外文学位 >Query processing and optimization for structural selection queries over XML data.
【24h】

Query processing and optimization for structural selection queries over XML data.

机译:针对XML数据的结构选择查询的查询处理和优化。

获取原文
获取原文并翻译 | 示例

摘要

Tree pattern matching is considered a core retrieval operation within XML query processing, as it enables selection of XML data based on their structural characteristics. Simple value-based schemes can effectively capture the structural relationships between XML data and convert structural constraints to value predicates. The first part of my thesis demonstrates that standard index structures (i.e. B+-trees) can be integrated with such schemes to achieve considerable performance gains while processing structure-aware queries, as they effectively provide early pruning of input data that contain no useful information for a given query. In particular, a family of novel index-based structural matching algorithms has been designed and implemented and their clear advantage over all previous relevant methods has been experimentally demonstrated. In subsequent work it is also demonstrated that processing methods based on similar principles can be utilized to handle generalized tree pattern queries with relaxed semantics.; The second part of my thesis considers the integration of all existing tree pattern query processing methods under a common optimization framework. New opportunities, derived from the tree structure of the data, have been explored and a new optimization strategy has been proposed, which avoids inherent problems of traditional solutions, namely large search spaces and heavy dependence on intermediate result size estimation. In particular, based on avoiding plans with unnecessarily large intermediate results, new holistic processing algorithms, which can exploit existing access methods and which present performance guarantees have been developed. As part of the optimization process, those holistic approaches are combined in a cost-based fashion. The holistic nature of the algorithms enables the definition of a cost model that (i) mitigates the propagation of estimation errors and (ii) enables global optimization while examining a small search space that explores only local decisions.; Subsequently, the problem of structural selection over XML data within an environment where different versions of the data can co-exist, is investigated. Within such environments users wish to retrieve portions of document versions at will. Path expression queries can effectively specify the portions of a particular version to be retrieved. Techniques that facilitate data version management and the ability to answer such queries over document versions are thus important. As part of this work novel storage schemes which integrate all document versions in a space-efficient manner, while enabling efficient structural selection have been designed and proposed.; While the work in the first three parts of the thesis targeted the unordered model of XML data, which is sufficient for the data-centric aspect of the language, order becomes of great importance when document centric applications are considered. Order-based queries include positional constraints and XPath navigation axes such as the following-sibling axis. The forth part of my thesis deals with the problem of efficient evaluation of such order-based queries. Novel holistic algorithms that evaluate those queries in a unified way, and avoid materialization of large intermediate results have been proposed. To the best of our knowledge, this is the first work that provides a complete, scalable, XML model-aware solution to the problem of supporting order within XML query processing.
机译:树模式匹配被认为是XML查询处理中的核心检索操作,因为它可以根据XML数据的结构特征选择XML数据。简单的基于值的方案可以有效地捕获XML数据之间的结构关系,并将结构约束转换为值谓词。论文的第一部分表明,标准索引结构(即B +树)可以与此类方案集成,从而在处理结构感知查询时获得可观的性能提升,因为它们有效地提供了对不包含有用信息的输入数据的早期修剪。给定的查询。尤其是,已经设计和实现了一系列新颖的基于索引的结构匹配算法,并已通过实验证明了它们相对于所有先前相关方法的明显优势。在随后的工作中,还证明了基于相似原理的处理方法可用于处理具有宽松语义的广义树模式查询。本文的第二部分考虑了在一个通用优化框架下所有现有树型查询处理方法的集成。探索了从数据树结构中获得的新机会,并提出了一种新的优化策略,该策略避免了传统解决方案固有的问题,即搜索空间大和严重依赖中间结果大小估计。特别地,基于避免具有不必要的较大中间结果的计划,已经开发了可以利用现有访问方法并提供当前性能保证的新的整体处理算法。作为优化过程的一部分,这些整体方法以基于成本的方式组合在一起。该算法的整体性质使得能够定义成本模型,该成本模型(i)减轻估计误差的传播,(ii)在检查仅探索局部决策的小型搜索空间的同时,进行全局优化。随后,研究了在不同版本的数据可以共存的环境中对XML数据进行结构选择的问题。在这样的环境中,用户希望随意检索部分文档版本。路径表达式查询可以有效地指定特定版本中要检索的部分。因此,促进数据版本管理以及对文档版本回答此类查询的能力的技术很重要。作为这项工作的一部分,已经设计并提出了新颖的存储方案,该方案以节省空间的方式集成了所有文档版本,同时实现了有效的结构选择。尽管本文的前三部分的工作针对XML数据的无序模型,这对于该语言的以数据为中心的方面已经足够了,但是在考虑以文档为中心的应用程序时,顺序变得非常重要。基于订单的查询包括位置约束和XPath导航轴,例如跟随兄弟轴。本文的第四部分讨论了对此类基于订单的查询进行有效评估的问题。已经提出了一种新颖的整体算法,该算法以统一的方式评估那些查询,并避免了大型中间结果的实现。据我们所知,这是第一项为XML查询处理中支持订单问题提供完整,可扩展的XML模型感知解决方案的工作。

著录项

  • 作者

    Vagena, Zografoula.;

  • 作者单位

    University of California, Riverside.;

  • 授予单位 University of California, Riverside.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 170 p.
  • 总页数 170
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号