【24h】

OrientStore: A Schema Based Native XML Storage System

机译:OrientStore:基于架构的本机XML存储系统

获取原文
获取原文并翻译 | 示例

摘要

The increasing number of XML repositories has provided the impetus to design and develop systems that can store and query XML data efficiently. Research to improve system performance has been largely concentrated on indexing paths and optimizing XML queries. In fact, the storage configuration of XML data on disk also has an impact on the efficiency of an XML data management system. Existing XML storage strategies can be classified into two categories: native XML storage and non-native XML storage. The main distinction between them is their data model. The former is based on the XML Data Models such as Document Object Model (DOM), and Object Exchange Model (OEM), while the latter is based on the traditional relational data model, or object-oriented data model. An evaluation of the alternative non-native storage strategies has been given in [6]. Here, we will focus on native XML storage strategies. Several native storage strategies have been developed in [1,2,3,5,8,11]. These can be classified into Element-Based (EB), Subtree-Based (SB) and Document-Based (DB). Both the Lore system [3] and TIMBER [1] utilize the classic EB strategy, where each element is an atomic unit of storage and is organized in a pre-ordered manner. Natix [2] is a well-known SB strategy. It divides the XML document tree into subtrees according to the physical page size, such that each subtree is a record. The sizes of the subtrees are kept as close as possible to the size of the physical page. A split matrix is defined to ensure that correlated element nodes remain clustered. Similar to the EB strategy, the records are stored in a pre-ordered way. The storage module in the Apache Xindice system [8] employs the DB strategy, whereby the entire XML document constitutes a single record. Other variations of storage strategies can be found in NeoCore XMS [11] where the XML data is first flattened to expose only the pure XML information, before they are passed on to a digital pattern process to create icons.. Tamino [5] is a leading commercial native XML database, but details of its storage structure are fairly sketchy. All the above native storage strategies are schema-independent, when schema information in the form of XML Schema or DTD is usually available or even indispensable. In order to facilitate data exchange, a standard schema (or DTD) is typically defined on the underlying XML files and published. Examples of available standard schema or DTDs include Chemical Markup Language, Mathematical Markup Language, News Markup Language, etc. Popular XML datasets such as the DBLP [9], Movie database [10], Shakespeare' Play [12] and XMark [4] come with its own DTD. The availability of schema information is crucial to data exchange applications, and query optimizations. We observe that schema information also has a key role to play in designing efficient and effective storage strategies for XML management systems. In this work, we develop a prototype native XML storage system, called OrientStore. OrientStore implements two schema-guided storage strategies, namely Element-Based Clustering (EBC), and Logical Partition-Based Clustering (LPC) strategies. In contrast with the present storage systems for XML data, OrientStore has the following unique features: a. It concretely investigates how schema information can be utilized to reduce the storage requirement and the response time of queries. b. It implements two schema-guided storage strategies: EBC and LPC. These strategies cluster correlated data in different ways to reduce the number of I/Os required during retrieval.
机译:越来越多的XML存储库为设计和开发可以有效存储和查询XML数据的系统提供了动力。提高系统性能的研究主要集中在索引路径和优化XML查询上。实际上,磁盘上XML数据的存储配置也对XML数据管理系统的效率有影响。现有的XML存储策略可以分为两类:本地XML存储和非本地XML存储。它们之间的主要区别是它们的数据模型。前者基于XML数据模型,例如文档对象模型(DOM)和对象交换模型(OEM),而后者则基于传统的关系数据模型或面向对象的数据模型。在[6]中给出了对非本地存储策略的评估。在这里,我们将重点介绍本机XML存储策略。在[1,2,3,5,8,11]中已经开发了几种本地存储策略。这些可以分为基于元素(EB),基于子树(SB)和基于文档(DB)。 Lore系统[3]和TIMBER [1]均使用经典的EB策略,其中每个元素都是存储的原子单位,并以预定方式进行组织。 Natix [2]是一种著名的SB策略。它根据物理页面大小将XML文档树划分为子树,从而每个子树都是一条记录。子树的大小保持与物理页面的大小尽可能接近。定义了拆分矩阵以确保相关元素节点保持聚类。与EB策略类似,记录以预定方式存储。 Apache Xindice系统[8]中的存储模块采用DB策略,从而整个XML文档构成一条记录。其他存储策略的变种可以在NeoCore XMS [11]中找到,在XML数据首先传递给数字模式过程以创建图标之前,首先将其扁平化以仅暴露纯XML信息。Tamino [5]是一种领先的商业本机XML数据库,但是其存储结构的细节还很粗略。当通常使用XML Schema或DTD形式的模式信息通常是必不可少的时,上述所有本机存储策略都是与模式无关的。为了促进数据交换,通常在底层XML文件上定义并发布标准架构(或DTD)。可用的标准架构或DTD的示例包括化学标记语言,数学标记语言,新闻标记语言等。流行的XML数据集,例如DBLP [9],电影数据库[10],莎士比亚戏剧[12]和XMark [4]带有自己的DTD。模式信息的可用性对于数据交换应用程序和查询优化至关重要。我们观察到模式信息在设计XML管理系统的高效存储策略中也起着关键作用。在这项工作中,我们开发了一个原型本机XML存储系统,称为OrientStore。 OrientStore实现了两种模式指导的存储策略,即基于元素的集群(EBC)和基于逻辑分区的集群(LPC)策略。与当前的XML数据存储系统相比,OrientStore具有以下独特功能:它具体研究了如何利用架构信息来减少存储需求和查询的响应时间。 b。它实现了两种模式指导的存储策略:EBC和LPC。这些策略以不同的方式对相关数据进行聚类,以减少检索期间所需的I / O数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号