Efficient Processing of Very Large XML Documents in Small Space.

机译：在小空间中高效处理非常大的XML文档。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The focus of this research was to develop a highly efficient serialized data structure for use in Document Object Model (DOM) query systems of Extensible Markup Language (XML) documents, which are increasingly the dominant model for data representation across the World Web. The space efficiency of the serialized data structure developed is shown to improve query response time and eliminates the need to rebuild the DOM representation each time it is desired to query an existing XML document. Moreover, this serialized data structure enables DOM modeling of extremely large XML documents at a low and fixed memory cost. New algorithms and software tools for working with these data structures have been developed and this allows for compatibility with standard DOM modeling. The structures, tool and techniques presented can be readily adapted for use on the Semantic Web as it becomes a more prominent feature of the internet.;The improved performance of the XML database storage and retrieval application developed in this thesis helps close the gap between relational database application performance and XML Document (NXD) database application performance. In addition, the novel DOM querying technique that we have developed in this thesis has potential application on hand held devices, autonomous robots, and other devices which have limited memory.;The research evolved from an examination of three principal limitations that restrict the widespread implementation of Semantic Web technologies. These limitations include: (1) Querying information stored in XML format. Representation in XML format is a hierarchical and queries are less efficient and more time consuming than comparable queries of data stored in relational database format [1] . (2) Complexity of tools and technologies. The suite of technologies, useful ontology's, and tools necessary for creating semantic web technologies are complex and development time is extensive [2]. (3) Insufficient user friendly interfaces. Users with no prior exposure to semantic technologies, such as “Description Logic (DL),” cannot easily utilize the power of semantic search capabilities. In this thesis, the first of the inhibiting factors restricting the growth of the semantic web will be addressed. It will be shown that serializing the data representation of an XML document representation enables significant improvement of XML-DOM query response times. This serialized representation also eliminates the restrictions on documents size imposed by most XML- DOM query systems.;Chapters 2 and 3 introduce and motivate the research described in this thesis. Chapters 4, 5 and 6 explain the basic costs inherent in DOM modeling, expand on the influence of DOM tree size and shape on the total cost of DOM modeling, and enumerate why DOM modeling of a XML documents is traditionally considered to take 2 to 5 times the size of the source XML document to store the DOM tree representing the document in memory. Chapters 7 and 8 introduce methods for reducing the size of DOM trees and establish upper and lower bounds for DOM tree size in relation to source XML documents; we establish a practical upper-bound of memory required for supporting XML DOM querying and show that it is smaller than the 2-5 size limitation described above. Chapter 9 introduces the Minimum DOM Node (MDN), a minimized data structure capable of storing DOM node information. It is also shown how an MDN Array (MDNA) can be used as an alternative to a traditional DOM tree. Chapter 10 examines the memory cost of using an MDNA for DOM modeling. It is also shown that using the MDNA model of an XML document allows DOM querying with a low and fixed allocation of memory. Chapter 11 details the process and costs involved in creating an MDNA from a given source XML document and is a major contribution of this work. Chapter 12 examines the W3C specification for the DOM interface and explains how an MDNA can be used to satisfy the requirements of core DOM node operations. In Chapter 13, a C/C++ implementation of a MDNA parser was developed to prove the concepts presented in this thesis. Chapter 14 examines the future research topics that follow logically from these developments.

机译：这项研究的重点是开发一种高效的序列化数据结构，以用于可扩展标记语言（XML）文档的文档对象模型（DOM）查询系统，该系统越来越成为遍及全球Web的数据表示的主导模型。显示了开发的序列化数据结构的空间效率，可以改善查询响应时间，并且无需在每次查询现有XML文档时都重建DOM表示。而且，这种序列化的数据结构能够以低且固定的内存成本对超大型XML文档进行DOM建模。已经开发了用于这些数据结构的新算法和软件工具，从而可以与标准DOM建模兼容。所呈现的结构，工具和技术可以很容易地在语义Web上使用，因为它已成为Internet的一个更加突出的特征。;本文开发的XML数据库存储和检索应用程序的性能改进有助于缩小关系数据库之间的差距。数据库应用程序性能和XML文档（NXD）数据库应用程序性能。此外，我们在本文中开发的新颖的DOM查询技术在手持设备，自主机器人以及其他内存有限的设备上具有潜在的应用前景。该研究是从对三个主要局限性的考察中发展而来的，这些局限性限制了广泛的实现Web技术。这些限制包括：（1）查询以XML格式存储的信息。 XML格式的表示是分层的，与以关系数据库格式存储的数据的可比查询相比，查询效率较低，耗时更多[1]。（2）工具和技术的复杂性。创建语义Web技术所需的技术，有用的本体和工具套件非常复杂，开发时间也很长[2]。（3）用户友好界面不足。事先没有接触过语义技术（例如“描述逻辑（DL）”）的用户就无法轻松利用语义搜索功能的强大功能。在本文中，将解决限制语义网增长的第一个制约因素。将显示，序列化XML文档表示形式的数据表示形式可以显着改善XML-DOM查询响应时间。这种序列化的表示形式也消除了大多数XML-DOM查询系统对文档大小的限制。第二章和第三章介绍并激发了本文中描述的研究。第4、5和6章介绍了DOM建模固有的基本成本，扩展了DOM树大小和形状对DOM建模总成本的影响，并列举了为什么传统上认为XML文档的DOM建模需要2到5乘以源XML文档的大小，以将表示文档的DOM树存储在内存中。第7章和第8章介绍了减少DOM树大小并建立与源XML文档有关的DOM树大小上限和下限的方法。我们建立了支持XML DOM查询所需的实际内存上限，并表明它小于上述2-5的大小限制。第9章介绍了最小DOM节点（MDN），它是一种能够存储DOM节点信息的最小化数据结构。还显示了如何将MDN阵列（MDNA）用作传统DOM树的替代方案。第10章探讨了使用MDNA进行DOM建模的内存成本。还表明，使用XML文档的MDNA模型可以以较少的固定内存分配来进行DOM查询。第11章详细介绍了根据给定的源XML文档创建MDNA的过程和成本，这是这项工作的主要贡献。第12章研究DOM接口的W3C规范，并说明如何使用MDNA满足核心DOM节点操作的要求。在第13章中，开发了MDNA解析器的C / C ++实现，以证明本文提出的概念。第14章从这些发展逻辑上考察了未来的研究主题。

著录项

作者
Meyer, Matthew K.;
展开▼
作者单位

City University of New York.;

展开▼
授予单位 City University of New York.;
学科 Computer Science.
学位 Ph.D.
年度 2012
页码 166 p.
总页数 166
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:43:03

相似文献

外文文献
中文文献
专利

1. Efficient Processing of XML Documents in Hadoop Map Reduce [J] . Dmitry Vasilenko, Mahesh Kurapati International Journal on Computer Science and Engineering . 2014,第9期

机译：Hadoop Map Reduce中XML文档的高效处理
2. Clustered Chain Path Index for XML Document: Efficiently Processing Branch Queries [J] . Hongqiang Wang, Jianzhong Li, Hongzhi Wang World Wide Web . 2008,第1期

机译：XML文档的群集链路径索引：有效处理分支查询
3. Schema-aware labelling of XML documents for efficient query and update processing in SemCrypt [J] . Katharina Gruen, Michael Karlinger, Michael Schrefl International Journal of Computer Systems Science & Engineering . 2006,第1期

机译：XML文档的模式感知标签，可在SemCrypt中进行有效的查询和更新处理
4. Clustered Absolute Path Index for XML Document: On Efficient Processing of Twig Queries [C] . Hongqiang Wang, Jianzhong Li, Hongzhi Wang APWeb 2006 International Workshops: XRA, IWSN, MEGA, and ICSE; 20060116-18; Harbin(CN) . 2006

机译：XML文档的聚集绝对路径索引：关于枝杈查询的有效处理
5. XML2REL: An efficient system for storing and querying XML documents using relational databases [D] . Atay, Mustafa 2006

机译：XML2REL：使用关系数据库存储和查询XML文档的有效系统
6. Using XML Metadata to Enable the Automatic Generation and Processing of HTML Forms from XML Documents [O] . Anil K. Dubey, Henry C. Chueh 2001

机译：使用XML元数据启用从XML文档自动生成和处理HTML表单的功能
7. Comparing Document Object Model (DOM) and simple API for XML (SAX) in processing XML document in leave application system [O] . Wahid Juliana 2008

机译：在休假申请系统中处理XML文档时，比较文档对象模型（DOM）和XML的简单API（SAX）

Efficient Processing of Very Large XML Documents in Small Space.

摘要

著录项

相似文献

相关主题

期刊订阅