首页> 外文会议>International conference on very large databases >OrientStore: A Schema Based Native XML Storage System
【24h】

OrientStore: A Schema Based Native XML Storage System

机译:OrientStore:基于模式的本机XML存储系统

获取原文

摘要

The increasing number of XML repositories has provided the impetus to design and develop systems that can store and query XML data efficiently. Research to improve system performance has been largely concentrated on indexing paths and optimizing XML queries. In fact, the storage configuration of XML data on disk also has an impact on the efficiency of an XML data management system. Existing XML storage strategies can be classified into two categories: native XML storage and non-native XML storage. The main distinction between them is their data model. The former is based on the XML Data Models such as Document Object Model (DOM), and Object Exchange Model (OEM), while the latter is based on the traditional relational data model, or object-oriented data model. An evaluation of the alternative non-native storage strategies has been given in [6]. Here, we will focus on native XML storage strategies. Several native storage strategies have been developed in [1,2,3,5,8,11]. These can be classified into Element-Based (EB), Subtree-Based (SB) and Document-Based (DB). Both the Lore system [3] and TIMBER [1] utilize the classic EB strategy, where each element is an atomic unit of storage and is organized in a pre-ordered manner. Natix [2] is a well-known SB strategy. It divides the XML document tree into subtrees according to the physical page size, such that each subtree is a record. The sizes of the subtrees are kept as close as possible to the size of the physical page. A split matrix is defined to ensure that correlated element nodes remain clustered. Similar to the EB strategy, the records are stored in a pre-ordered way. The storage module in the Apache Xindice system [8] employs the DB strategy, whereby the entire XML document constitutes a single record. Other variations of storage strategies can be found in NeoCore XMS [11] where the XML data is first flattened to expose only the pure XML information, before they are passed on to a digital pattern process to create icons.. Tamino [5] is a leading commercial native XML database, but details of its storage structure are fairly sketchy. All the above native storage strategies are schema-independent, when schema information in the form of XML Schema or DTD is usually available or even indispensable. In order to facilitate data exchange, a standard schema (or DTD) is typically defined on the underlying XML files and published. Examples of available standard schema or DTDs include Chemical Markup Language, Mathematical Markup Language, News Markup Language, etc. Popular XML datasets such as the DBLP [9], Movie database [10], Shakespeare' Play [12] and XMark [4] come with its own DTD. The availability of schema information is crucial to data exchange applications, and query optimizations. We observe that schema information also has a key role to play in designing efficient and effective storage strategies for XML management systems. In this work, we develop a prototype native XML storage system, called OrientStore. OrientStore implements two schema-guided storage strategies, namely Element-Based Clustering (EBC), and Logical Partition-Based Clustering (LPC) strategies. In contrast with the present storage systems for XML data, OrientStore has the following unique features: a. It concretely investigates how schema information can be utilized to reduce the storage requirement and the response time of queries. b. It implements two schema-guided storage strategies: EBC and LPC. These strategies cluster correlated data in different ways to reduce the number of I/Os required during retrieval.
机译:越来越多的XML存储库提供了设计和开发可以有效地存储和查询XML数据的系统的推动力。提高系统性能的研究在很大程度上集中在索引路径上并优化XML查询。实际上,磁盘上的XML数据的存储配置也对XML数据管理系统的效率产生了影响。现有的XML存储策略可以分为两类:本机XML存储和非本机XML存储。它们之间的主要区别是它们的数据模型。前者基于XML数据模型,如文档对象模型(DOM)和对象交换模型(OEM),而后者基于传统的关系数据模型或面向对象的数据模型。 [6]中给出了对替代非本地存储策略的评估。在这里,我们将专注于本机XML存储策略。 [1,2,3,5,8,11]已开发出几种本土存储策略。这些可以分为基于元素(EB),基于子树(SB)和基于文档的(DB)。 LORE系统[3]和木材[1]都利用了经典的EB策略,其中每个元素是原子存储单元,并以预先订购的方式组织。 Natix [2]是一个着名的某人策略。它根据物理页面大小将XML文档树划分为子树,以便每个子树是记录。子树的大小可以尽可能接近物理页面的大小。定义拆分矩阵以确保相关元素节点保持群集。类似于EB策略,记录以预先订购的方式存储。 Apache Xindice系统[8]中的存储模块采用DB策略,从而整个XML文档构成了单个记录。在Neocore XMS [11]中可以找到存储策略的其他变体,其中XML数据首次展平以仅曝光纯XML信息,然后在它们传递到数字模式过程以创建图标之前.. Tamino [5]是一个领先的商业本机XML数据库,但其存储结构的详细信息相当粗略。所有上述本机存储策略都是独立的,当XML Schema或DTD形式的架构信息通常可用甚至不可或缺。为了促进数据交换,通常在底层XML文件上定义标准架构(或DTD)并发布。可用标准架构或DTD的示例包括化学标记语言,数学标记语言,新闻标记语言等。流行的XML数据集如DBLP [9],电影数据库[10],莎士比亚播放[12]和Xmark [4]带有自己的DTD。架构信息的可用性对于数据交换应用程序至关重要,以及查询优化。我们观察到架构信息还具有在为XML管理系统设计高效且有效的存储策略方面发挥的关键作用。在这项工作中,我们开发了一个称为OrientStore的原型本机XML存储系统。 OrientStore实现了两个模式引导存储策略,即基于元素的聚类(EBC),以及基于逻辑分区的聚类(LPC)策略。相反,与XML数据的本存储系统相比,OrientStore具有以下唯一功能:a。它具体研究了如何利用模式信息来减少存储要求和查询的响应时间。湾它实现了两个架构引导的存储策略:EBC和LPC。这些策略以不同的方式群集数据以不同的方式来减少检索期间所需的I / O的数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号