首页> 外文学位 >Thor: A universal XML index for efficient XPath query processing.
【24h】

Thor: A universal XML index for efficient XPath query processing.

机译:Thor:一个通用的XML索引,用于高效的XPath查询处理。

获取原文
获取原文并翻译 | 示例

摘要

The infancy of XML, in conjunction with the rapid growth and future potential that XML will become the data structure format of choice for many years to come, is a powerful motivation for developing the highest performance index possible for XPath. High speed XPath processing is essential for XQuery, since a single XQuery query may require execution of a large number of nested XPath queries. Relational Database Management Systems, RDBMS, have a very large installed base with large investment support therefore much effort has been placed on protecting and reusing existing relational technology by transforming the XPath requirements into existing RDBMS architectures, either by encoding limited additional data into the B+Tree index structure or by encoding the XML structure into relational tables. In general, the semi-structured data content and hierarchical tree structure of an XML document does not fit well into the relational model. To address the need, recently, there have been a large number of native XML databases, NXDB, released. Although commercial database systems provide capabilities to process XPath, they are largely optimized for rapid processing of ancestor-descendant and value based queries, yet structure navigation in these systems is still relatively slow and can be improved upon. Therefore, performance of the state of the art XML aware database systems has yet to achieve the pinnacle of performance.;In this dissertation, to address the XPath query performance challenge, we propose the creation of an application specific hierarchical navigational index system that can be constructed on top of a relational database 150 tuple storage system. The name of this new index is THOR, Threaded Hierarchical on Relational, and the name of the system is THOR4XP, Threaded Hierarchical on Relational for XPath, and was previously known in the literature as MTree, Multi-Threaded Tree. The main contributions are: (1) Introduction of the following and preceding threaded pointers into a tree data structure; (2) Integration of multiple doubly-linked, threaded, node label paths into a tree; (3) Combination of TSPath path summary index and structure index to improve query performance; (4) In situ indexing to eliminate the need for data model and query model transformations; (5) Hash Path Join (HPJ), Optimal Path Join (OPJ) and the Subtree Path Join (SPJ) algorithms for use with navigational indexes; (6) Index design that substantially reduces the need for sorting intermediate sequences; (7) Enriched index leaf structure that encodes both node properties and edge properties; (8) Composite partial multilevel intermediate node structures enabling multiple access paths into the same leaf set and tuple set that is integrated with a value index; (9) Application as a distributed P2P routing and indexing method; (10) A holistic systems design and implementation of a new hierarchical index that can work in combination with underlying relational storage systems in such a way as to make it feasible to integrate the index into relational database management systems.;The join algorithms efficiently resolve element name specific XPath navigational queries, in many cases without a need for sorting or for qualified name filtering on intermediate sequences. The optimization methods are applicable for all axes but are presented for the four major XPath axes: descendant, ancestor, following and preceding. Experimental results are included that show substantial performance improvements over other well known methods.
机译:XML的诞生,再加上XML的迅速增长和未来的潜力,XML将成为未来许多年的首选数据结构格式,这是开发XPath可能的最高性能指标的强大动力。高速XPath处理对于XQuery至关重要,因为单个XQuery查询可能需要执行大量嵌套的XPath查询。关系数据库管理系统(RDBMS)具有非常庞大的安装基础,并具有大量投资支持,因此,通过将XPath需求转换为现有RDBMS体系结构(通过将有限的附加数据编码到B +中),人们在保护和重用现有的关系技术上付出了很多努力。树索引结构或通过将XML结构编码到关系表中。通常,XML文档的半结构化数据内容和分层树结构不太适合关系模型。为了满足需求,最近发布了许多本地XML数据库NXDB。尽管商业数据库系统提供了处理XPath的功能,但它们在很大程度上已针对快速处理祖先和基于值的查询进行了优化,但是这些系统中的结构导航仍然相对较慢,可以对其进行改进。因此,最先进的XML感知数据库系统的性能尚未达到性能的顶峰。在本论文中,为了解决XPath查询性能挑战,我们建议创建一个特定于应用程序的分层导航索引系统,该系统可以在关系数据库150元组存储系统的顶部构建的存储单元。这个新索引的名称是THOR,关系上的线程层次,而系统的名称是THOR4XP,关系上的XPath线程层次,在文献中以前被称为MTree,多线程树。主要贡献是:(1)将以下和前面的线程指针引入树数据结构; (2)将多个双链接,线程化的节点标签路径集成到树中; (3)结合TSPath路径摘要索引和结构索引,提高查询性能; (4)原位索引,消除了数据模型和查询模型转换的需要; (5)与导航索引一起使用的哈希路径联接(HPJ),最优路径联接(OPJ)和子树路径联接(SPJ)算法; (6)索引设计,大大减少了对中间序列进行排序的需要; (7)丰富的索引叶结构,可对节点属性和边缘属性进行编码; (8)组合的局部多级中间节点结构,使多个访问路径可以进入与值索引集成在一起的同一叶集和元组集中; (9)作为分布式P2P路由和索引方法的应用; (10)整体系统的设计和新分层索引的实现,该分层索引可以与基础关系存储系统结合使用,从而使将索引集成到关系数据库管理系统中变得可行。名称特定的XPath导航查询,在许多情况下不需要排序或对中间序列进行合格的名称过滤。优化方法适用于所有轴,但针对四个主要XPath轴进行了介绍:后代,祖先,跟随和向前。包括的实验结果表明,与其他众所周知的方法相比,它们的性能有了实质性的提高。

著录项

  • 作者

    Pettovello, P. Mark.;

  • 作者单位

    Wayne State University.;

  • 授予单位 Wayne State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 163 p.
  • 总页数 163
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号