首页> 外文会议>IEEE International Congress on Big Data >Scalable XPath Evaluation on Large-Scale Continuously Evolving XML Repositories
【24h】

Scalable XPath Evaluation on Large-Scale Continuously Evolving XML Repositories

机译:在大规模连续演化的XML存储库上可扩展的XPath评估

获取原文

摘要

Continuously Evolving XML(CEXML) documents are important for representing constantly-changing information in a number of emerging domains such as software configuration management and geographical information systems. CEXML document consists of multiple versions of an XML document as it evolves over time. Evaluating XPath expressions in large CEXML repositories is inherently challenging because of the additional temporal dimension. This paper introduces an important class of XPath queries for CEXML documents called version specific XPath expressions (VS-XPath). We present a scalable and efficient framework for VS-XPath evaluation on CEXML repositories. Our framework is a novel adaptation of the interval-based indexing scheme and it incorporates several unique features. First, we significantly reduce the index computation and storage costs by selectively indexing interspersed subsets of versions of CEXML documents. Second, we present a set of algorithms that utilize the available indices to obtain first-cut solutions of XPath queries and refine the solutions by taking into account the edits occurring between various versions. Third, we propose a unique method to drastically prune the edits that need to be processed when evaluating a XPath expression thereby providing significant performance gains. This paper also reports a detailed experimental study demonstrating the scalability and efficiency benefits of the proposed framework in terms of indexing costs, query latencies and storage costs.
机译:不断发展的XML(CEXML)文档对于在许多新兴域中表示不断变化的信息,例如软件配置管理和地理信息系统。 CEXML文档包含多个版本的XML文档,因为它随着时间的推移而发展。评估大CEXML存储库中的XPath表达式是由于额外的时间维度而本质上是具有挑战性的。本文介绍了名为Version特定XPath表达式(VS-XPath)的CEXML文档的重要类别的XPath查询。我们为CEXML存储库提供了一个可扩展有效的vs-xpath评估框架。我们的框架是一种新颖的基于间隔的索引方案的适应性,它包含了几种独特的功能。首先,我们通过选择性索引CEXML文档版本的交流子集来显着降低指数计算和存储成本。其次,我们展示了一组算法,该算法利用可用指标来获得XPath查询的第一切割解决方案,并通过考虑各种版本之间发生的编辑来优化解决方案。第三,我们提出了一种唯一的方法,可以在评估XPath表达时彻底修剪需要处理的编辑,从而提供显着的性能增益。本文还报告了一个详细的实验研究,展示了在索引成本,查询延迟和储存成本方面提出了拟议框架的可扩展性和效率优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号