首页> 外文会议>IEEE International Congress on Big Data >Scalable XPath Evaluation on Large-Scale Continuously Evolving XML Repositories
【24h】

Scalable XPath Evaluation on Large-Scale Continuously Evolving XML Repositories

机译:大规模持续发展的XML存储库上的可扩展XPath评估

获取原文

摘要

Continuously Evolving XML(CEXML) documents are important for representing constantly-changing information in a number of emerging domains such as software configuration management and geographical information systems. CEXML document consists of multiple versions of an XML document as it evolves over time. Evaluating XPath expressions in large CEXML repositories is inherently challenging because of the additional temporal dimension. This paper introduces an important class of XPath queries for CEXML documents called version specific XPath expressions (VS-XPath). We present a scalable and efficient framework for VS-XPath evaluation on CEXML repositories. Our framework is a novel adaptation of the interval-based indexing scheme and it incorporates several unique features. First, we significantly reduce the index computation and storage costs by selectively indexing interspersed subsets of versions of CEXML documents. Second, we present a set of algorithms that utilize the available indices to obtain first-cut solutions of XPath queries and refine the solutions by taking into account the edits occurring between various versions. Third, we propose a unique method to drastically prune the edits that need to be processed when evaluating a XPath expression thereby providing significant performance gains. This paper also reports a detailed experimental study demonstrating the scalability and efficiency benefits of the proposed framework in terms of indexing costs, query latencies and storage costs.
机译:不断发展的XML(CEXML)文档对于表示许多新兴领域(例如软件配置管理和地理信息系统)中不断变化的信息非常重要。随着时间的推移,CEXML文档包含XML文档的多个版本。由于额外的时间维度,在大型CEXML存储库中评估XPath表达式本质上具有挑战性。本文介绍了针对CEXML文档的一类重要的XPath查询,称为版本特定的XPath表达式(VS-XPath)。我们为CEXML存储库上的VS-XPath评估提供了一个可扩展且高效的框架。我们的框架是对基于时间间隔的索引方案的新颖改编,它包含了几个独特的功能。首先,我们通过选择性地索引散布在CEXML文档版本中的子集来显着降低索引的计算和存储成本。其次,我们提出了一组算法,这些算法利用可用的索引来获取XPath查询的第一手解决方案,并考虑到各个版本之间发生的编辑来完善解决方案。第三,我们提出了一种独特的方法,可以在评估XPath表达式时大幅修剪需要处理的编辑内容,从而显着提高性能。本文还报告了一项详细的实验研究,从索引成本,查询延迟和存储成本方面证明了所提出框架的可伸缩性和效率优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号