OrientStore: A Schema Based Native XML Storage System

机译：OrientStore：基于架构的本机XML存储系统

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The increasing number of XML repositories has provided the impetus to design and develop systems that can store and query XML data efficiently. Research to improve system performance has been largely concentrated on indexing paths and optimizing XML queries. In fact, the storage configuration of XML data on disk also has an impact on the efficiency of an XML data management system. Existing XML storage strategies can be classified into two categories: native XML storage and non-native XML storage. The main distinction between them is their data model. The former is based on the XML Data Models such as Document Object Model (DOM), and Object Exchange Model (OEM), while the latter is based on the traditional relational data model, or object-oriented data model. An evaluation of the alternative non-native storage strategies has been given in [6]. Here, we will focus on native XML storage strategies. Several native storage strategies have been developed in [1,2,3,5,8,11]. These can be classified into Element-Based (EB), Subtree-Based (SB) and Document-Based (DB). Both the Lore system [3] and TIMBER [1] utilize the classic EB strategy, where each element is an atomic unit of storage and is organized in a pre-ordered manner. Natix [2] is a well-known SB strategy. It divides the XML document tree into subtrees according to the physical page size, such that each subtree is a record. The sizes of the subtrees are kept as close as possible to the size of the physical page. A split matrix is defined to ensure that correlated element nodes remain clustered. Similar to the EB strategy, the records are stored in a pre-ordered way. The storage module in the Apache Xindice system [8] employs the DB strategy, whereby the entire XML document constitutes a single record. Other variations of storage strategies can be found in NeoCore XMS [11] where the XML data is first flattened to expose only the pure XML information, before they are passed on to a digital pattern process to create icons.. Tamino [5] is a leading commercial native XML database, but details of its storage structure are fairly sketchy. All the above native storage strategies are schema-independent, when schema information in the form of XML Schema or DTD is usually available or even indispensable. In order to facilitate data exchange, a standard schema (or DTD) is typically defined on the underlying XML files and published. Examples of available standard schema or DTDs include Chemical Markup Language, Mathematical Markup Language, News Markup Language, etc. Popular XML datasets such as the DBLP [9], Movie database [10], Shakespeare' Play [12] and XMark [4] come with its own DTD. The availability of schema information is crucial to data exchange applications, and query optimizations. We observe that schema information also has a key role to play in designing efficient and effective storage strategies for XML management systems. In this work, we develop a prototype native XML storage system, called OrientStore. OrientStore implements two schema-guided storage strategies, namely Element-Based Clustering (EBC), and Logical Partition-Based Clustering (LPC) strategies. In contrast with the present storage systems for XML data, OrientStore has the following unique features: a. It concretely investigates how schema information can be utilized to reduce the storage requirement and the response time of queries. b. It implements two schema-guided storage strategies: EBC and LPC. These strategies cluster correlated data in different ways to reduce the number of I/Os required during retrieval.

机译：越来越多的XML存储库为设计和开发可以有效存储和查询XML数据的系统提供了动力。提高系统性能的研究主要集中在索引路径和优化XML查询上。实际上，磁盘上XML数据的存储配置也对XML数据管理系统的效率有影响。现有的XML存储策略可以分为两类：本地XML存储和非本地XML存储。它们之间的主要区别是它们的数据模型。前者基于XML数据模型，例如文档对象模型（DOM）和对象交换模型（OEM），而后者则基于传统的关系数据模型或面向对象的数据模型。在[6]中给出了对非本地存储策略的评估。在这里，我们将重点介绍本机XML存储策略。在[1,2,3,5,8,11]中已经开发了几种本地存储策略。这些可以分为基于元素（EB），基于子树（SB）和基于文档（DB）。 Lore系统[3]和TIMBER [1]均使用经典的EB策略，其中每个元素都是存储的原子单位，并以预定方式进行组织。 Natix [2]是一种著名的SB策略。它根据物理页面大小将XML文档树划分为子树，从而每个子树都是一条记录。子树的大小保持与物理页面的大小尽可能接近。定义了拆分矩阵以确保相关元素节点保持聚类。与EB策略类似，记录以预定方式存储。 Apache Xindice系统[8]中的存储模块采用DB策略，从而整个XML文档构成一条记录。其他存储策略的变种可以在NeoCore XMS [11]中找到，在XML数据首先传递给数字模式过程以创建图标之前，首先将其扁平化以仅暴露纯XML信息。Tamino [5]是一种领先的商业本机XML数据库，但是其存储结构的细节还很粗略。当通常使用XML Schema或DTD形式的模式信息通常是必不可少的时，上述所有本机存储策略都是与模式无关的。为了促进数据交换，通常在底层XML文件上定义并发布标准架构（或DTD）。可用的标准架构或DTD的示例包括化学标记语言，数学标记语言，新闻标记语言等。流行的XML数据集，例如DBLP [9]，电影数据库[10]，莎士比亚戏剧[12]和XMark [4]带有自己的DTD。模式信息的可用性对于数据交换应用程序和查询优化至关重要。我们观察到模式信息在设计XML管理系统的高效存储策略中也起着关键作用。在这项工作中，我们开发了一个原型本机XML存储系统，称为OrientStore。 OrientStore实现了两种模式指导的存储策略，即基于元素的集群（EBC）和基于逻辑分区的集群（LPC）策略。与当前的XML数据存储系统相比，OrientStore具有以下独特功能：它具体研究了如何利用架构信息来减少存储需求和查询的响应时间。 b。它实现了两种模式指导的存储策略：EBC和LPC。这些策略以不同的方式对相关数据进行聚类，以减少检索期间所需的I / O数量。

著录项

来源
《Twenty-ninth International Conference on Very Large Databases; Sep 9-12, 2003; Berlin, Germany》|2003年|p.1057-1060|共4页
会议地点 Berlin(DE);Berlin(DE)
作者
Xiaofeng Meng; Daofeng Luo; Mong Li Lee; Jing An;
展开▼
作者单位

Information School Renmin University of China Beijing 100872, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-26 14:15:35

相似文献

外文文献
中文文献
专利

1. [12]aneNsub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://www.niso.org/standards/z39-96/ns/oasis-exchange/table" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/"3/sub-based multifunctional compounds as fluorescent probes and nucleic acids delivering agents [J] . Yong-Guang Gao, Shu-Yuan Huangfu, Suryaji Patil, Drug delivery. . 2020,第1期

机译：[12] Anen _{3 /亚>基于荧光探针和核酸递送剂的多官能化合物}
2. OrientX: An Integrated. Schema Based Native XML Database System [J] . MENG Xiaofeng, WANG Xiaofeng, XIE Min, Wuhan University Journal of Natural Sciences . 2006,第5期

机译：OrientX：集成的。基于模式的本机XML数据库系统
3. A Systematic Approach for Changing XML Namespaces in XML Schemas and Managing their Effects on Associated XML Documents under Schema Versioning [J] . Zouhaier Brahmia, Fabio Grandi, Rafik Bouaziz Journal of digital information management . 2016,第5期

机译：一种在XML模式中更改XML命名空间并管理其在模式版本控制下对关联XML文档的影响的系统方法
4. OrientStore: A Schema Based Native XML Storage System [C] . Xiaofeng Meng, Daofeng Luo, Mong Li Lee, International conference on very large databases . 2003

机译：OrientStore：基于模式的本机XML存储系统
5. An open framework code generation toolkit for distributed systems based on XML Schemas. [D] . Govindaraju, Madhusudhan. 2002

机译：一个用于基于XML模式的分布式系统的开放框架代码生成工具包。
6. XML schemas for common bioinformatic data types and their application in workflow systems [O] . Philipp N Seibel, Jan Krüger, Sven Hartmeier, 2006

机译：常见生物信息数据类型的XML模式及其在工作流系统中的应用
7. On stability, L2-gain and H∞ control for switched systems [O] . Jun Zhao, David J. Hill 2008

机译：在稳定性上， L 2 -gain和 h ∞ < / MML：MROW> 用于交换系统的控制

OrientStore: A Schema Based Native XML Storage System

摘要

著录项

相似文献

相关主题

期刊订阅