【24h】

XQueC: Pushing Queries to Compressed XML Data

机译:XQueC:将查询推送到压缩的XML数据

获取原文
获取原文并翻译 | 示例

摘要

Initially proposed as a data interchange format, XML aims also at becoming a format for data storage and management. However, XML documents in their textual form are rather verbose and tend to predate disk space, due to the textual and repetitive nature of the XML tags and of several XML types. One solution to this space occupancy problem consists of compressing XML. The XMill project proposed an XML-specific compression method: it compresses the structure (XML tags) separately from the content (data nodes, leaves of the XML tree), which is squeezed into a set of semantically uniform containers: for example, one container stores the text values of all elements in the document, another container stores all honeNo> etc. Each container is again separately compressed, by using the best suited compression algorithm; thus, XMill makes maximal use of inherent structure commonalities among semantically similar items. However, an XMill-compressed document is opaque to a query processor: thus, one must fully decompress a full chunk of data before being able to query it. The XGrind project [9] pioneered the field of query processing on compressed XML documents. XGrind does not separate data from structure: an XGrind-compressed XML document is still an XML document, whose tags have been dictionary-encoded, and whose data nodes have been compressed using the Huffmann algorithm [6] and left at their place in the document. XGrind's query processor can be considered an extended SAX parser, which can handle exact-match and prefix-match queries on compressed values and partial-match and range queries on decompressed values. However, XGrind does not support several operations in the compressed domain such as non-equality selections, joins, aggregations, nested queries or (construct) operations. Such operations occur in many XML query scenarios, as illustrated by XML benchmarks (e.g., all but the first two of the 20 queries in XMark [8]).
机译:XML最初提出是一种数据交换格式,但其目的还在于成为一种用于数据存储和管理的格式。但是,由于XML标签和几种XML类型的文本和重复性质,其文本形式的XML文档相当冗长,并且往往早于磁盘空间。解决此空间占用问题的一种方法是压缩XML。 XMill项目提出了一种XML特定的压缩方法:将结构(XML标记)与内容(数据树,XML树的叶子)分开压缩,然后将其压缩到一组语义统一的容器中:例如,一个容器将所有元素的文本值存储在文档中,另一个容器存储所有等。每个容器再次使用最合适的压缩算法分别压缩;因此,XMill在语义相似的项目之间最大程度地利用了固有的结构共性。但是,XMill压缩的文档对于查询处理器而言是不透明的:因此,必须先对全部数据进行完全解压缩,然后才能对其进行查询。 XGrind项目[9]开创了压缩XML文档的查询处理领域。 XGrind不会将数据与结构分离:经过XGrind压缩的XML文档仍然是XML文档,其标签已经过字典编码,并且其数据节点已使用Huffmann算法[6]进行了压缩,并留在了文档中。 XGrind的查询处理器可以看作是扩展的SAX解析器,它可以处理压缩值的完全匹配和前缀匹配查询以及解压缩值的部分匹配和范围查询。但是,XGrind在压缩域中不支持多种操作,例如非相等选择,联接,聚合,嵌套查询或(构造)操作。如XML基准所示,此类操作在许多XML查询方案中都会发生(例如,XMark [8]中20个查询中除前两个查询之外的所有查询)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号