首页> 外文OA文献 >Algorithms for XML stream processing : massive data, external memory and scalable performance
【2h】

Algorithms for XML stream processing : massive data, external memory and scalable performance

机译:XML流处理算法:海量数据,外部存储器和可扩展性能

摘要

Many modern applications require processing of massive streams of XML data, creating difficult technical challenges. Among these, there is the design and implementation of applications to optimize the processing of XPath queries and to provide an accurate cost estimation for these queries processed on a massive steam of XML data. In this thesis, we propose a novel performance prediction model which a priori estimates the cost (in terms of space used and time spent) for any structural query belonging to Forward XPath. In doing so, we perform an experimental study to confirm the linear relationship between stream-processing and data-access resources. Therefore, we introduce a mathematical model (linear regression functions) to predict the cost for a given XPath query. Moreover, we introduce a new selectivity estimation technique. It consists of two elements. The first one is the path tree structure synopsis: a concise, accurate, and convenient summary of the structure of an XML document. The second one is the selectivity estimation algorithm: an efficient streamquerying algorithm to traverse the path tree synopsis for estimating the values of cost-parameters. Those parameters are used by the mathematical model to determine the cost of a given XPath query. We compare the performance of our model with existing approaches. Furthermore, we present a use case for an online stream-querying system. The system uses our performance predicate model to estimate the cost for a given XPath query in terms of time/memory. Moreover, it provides an accurate answer for the query's sender. This use case illustrates the practical advantages of performance management with our techniques.
机译:许多现代应用程序需要处理大量XML数据流,从而带来了艰巨的技术挑战。其中,包括应用程序的设计和实现,以优化XPath查询的处理并为在大量XML数据流上处理的这些查询提供准确的成本估算。在本文中,我们提出了一种新颖的性能预测模型,该模型可以先验地估计属于Forw​​ard XPath的任何结构化查询的成本(就使用的空间和花费的时间而言)。为此,我们进行了一项实验研究,以确认流处理和数据访问资源之间的线性关系。因此,我们引入了数学模型(线性回归函数)来预测给定XPath查询的成本。此外,我们介绍了一种新的选择性估计技术。它由两个元素组成。第一个是路径树结构简介:XML文档结构的简要,准确和方便的摘要。第二个是选择性估计算法:一种有效的流查询算法,它遍历路径树概要以估计成本参数的值。数学模型使用这些参数来确定给定XPath查询的成本。我们将模型的性能与现有方法进行比较。此外,我们提出了在线流查询系统的用例。该系统使用我们的性能谓词模型来根据时间/内存估算给定XPath查询的成本。此外,它为查询的发送者提供了准确的答案。该用例说明了使用我们的技术进行绩效管理的实际优势。

著录项

  • 作者

    Alrammal Muath;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号