首页> 外文学位 >Algorithms for management of document-centric XML data.
【24h】

Algorithms for management of document-centric XML data.

机译:用于管理以文档为中心的XML数据的算法。

获取原文
获取原文并翻译 | 示例

摘要

XML, initially designed for large scale text publishing, has rapidly evolved as a standard for a wide variety of data exchange and representation applications. With an increased volume of data, XML data management has been the subject of intensive research. Database research groups have concentrated on building database management frameworks around semistructured data represented as XML. At the same time, humanities research groups have concentrated on development of application specific XML-compliant markup languages, and application of XML to encoding a wide array of documents.; Two major kinds of XML documents emerge from applications: data-centric and document-centric. Data-centric documents are characterized by a fairly regular structure and occur as a standard format for data exchange and representation of semistructured data. Document-centric XML has, in general, a much more irregular structure and is often encountered as the means of document markup. In recent years, a number of applications of XML to document-centric encoding have led to markup that could not be stored in a hierarchical XML document (the concurrent markup hierarchies problem). This is mainly a consequence of the multi-hierarchical nature of text documents: the physical location hierarchy (document pages and lines), the text structure hierarchy (paragraphs, sentences, and words), etc. A prominent example of document-centric XML with multiple hierarchies is the XML encoding of manuscript folio images: the heterogeneous information to be encoded (from text and images) is very rarely hierarchical.; The problem of concurrent markup hierarchies in document-centric XML encodings has attracted attention of a number of humanities researchers in recent years. Previously proposed solutions to this problem rely on the XML expertise of humans and their ability to maintain correct schemas for complex markup languages. This thesis introduces a framework that allows the humans to concentrate on the semantic aspects of the encoding, while leaving the burden of maintaining XML documents to the software. We formally define the notion of concurrent markup hierarchies and concurrent XML documents and we give algorithms for document-centric XML data management, with a special focus on document-centric XML documents with concurrent markup.
机译:XML最初是为大规模文本发布而设计的,已迅速发展成为各种数据交换和表示应用程序的标准。随着数据量的增加,XML数据管理已成为深入研究的主题。数据库研究小组专注于围绕表示为XML的半结构化数据构建数据库管理框架。同时,人文研究小组专注于开发特定于应用程序的符合XML的标记语言,以及将XML应用到各种文档的编码。应用程序中出现了两种主要的XML文档:以数据为中心和以文档为中心。以数据为中心的文档以相当规则的结构为特征,并作为数据交换和半结构化数据表示的标准格式出现。通常,以文档为中心的XML具有更加不规则的结构,并且经常被用作文档标记的手段。近年来,XML在以文档为中心的编码中的大量应用导致了无法存储在分层XML文档中的标记(并发标记层次结构问题)。这主要是文本文档具有多层次性质的结果:物理位置层次结构(文档页面和行),文本结构层次结构(段落,句子和单词)等。以文档为中心的XML的杰出示例多重层次结构是手稿作品图像的XML编码:要编码的异类信息(来自文本和图像)很少是层次结构的。近年来,以文档为中心的XML编码中的并发标记层次结构问题引起了许多人文研究人员的关注。先前针对此问题提出的解决方案依赖于人类的XML专业知识及其维护复杂标记语言的正确架构的能力。本文介绍了一个框架,该框架允许人们专注于编码的语义方面,而将维护XML文档的负担留给了软件。我们正式定义了并发标记层次结构和并发XML文档的概念,并给出了以文档为中心的XML数据管理的算法,其中特别着重于具有并发标记的以文档为中心的XML文档。

著录项

  • 作者

    Iacob, Ionut Emil.;

  • 作者单位

    University of Kentucky.;

  • 授予单位 University of Kentucky.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 212 p.
  • 总页数 212
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号