首页> 外文会议>IEEE International Conference on Semantic Computing >ELF: A Constraint-Aware XQuery Engine for Processing XML Streams with Minimized Memory Footprint
【24h】

ELF: A Constraint-Aware XQuery Engine for Processing XML Streams with Minimized Memory Footprint

机译:ELF:一个约束感知XQuery引擎,用于处理具有最小化内存占用的XML流

获取原文

摘要

XML and XQuery [1] have been widely accepted as the standard data representation and query language for web applications. When the input consists of a large amount of XML tokens, the main memory buffer requirement in XML stream processing can be significant, which might also lead to a significant CPU consumption due to the manipulation cost on the buffered data. To provide real-time responses, serious challenges in memory utilization are faced by the XQuery evaluation over XML streams. In many practical applications, XML streams are generated following a pre-defined semantic constraint such as the Document Type Definition (DTD) and XML schema [1], shown by the following two scenarios: Network Traffic Monitoring. For monitoring network traffic, anomalies of network traffic flow may need to be detected from the statistical data sent in XML streams. In such a case, the XML stream, which would be generated by a work-flow engine or simply a customized program, will follow a pre-defined schema. News Publishing. In such scenario, the news server retrieves news from a large number of sources, such as different reporter devices, different broadcast agencies and government sources and disseminates messages as an XML stream to subscribers. The sources may all agree with a pre-defined schema. Utilizing such semantic knowledge of the stream input enables us to on-the-fly predict the non-occurrence of a given pattern within a bound context. This can help to avoid data buffering and to release buffered data at an earlier moment, thus achieving a minimized memory footprint. Let us consider a query as below: FOR $a IN $ROOT/news WHERE $a/location = "Boston" RETURN $a/entry, $a/comment Without semantic knowledge, for each bound news element, the earliest we can perform the predicate checking on locations, output entries and comments then release the corresponding buffer is after the news has been completely received. Assume we are given the semantics of the news element type as the DTD below: . By such schema knowledge of the input stream, if two consecutive advertisements subelements are met within an news element, we can guarantee that no more location can occur under this news. If none of the received locations within the news is equal to "Boston", buffered entries of the binding can be purged from memory and future coming comments can be directly dropped without any buffering because this news is guaranteed to be failing. More examples on query optimization by utilizing such semantic knowledge can be found in [5]. State-of-the-Art. Reducing memory consumption is very important for stream applications. Only a limited number of XML stream processing engines [3] [6] [4] have looked at the schema-based query optimization opportunity. Among them, FluXQuery [4] does not support filtering-related optimizations. The focus of [6] and [3] is not on buffer minimization. They only statically capture limited constraints from the given schema. ELF Solution. We propose ELF (Constraint-aware XQuery Engine for Processing XML Streams with Minimized Memory Footprint) in this demonstration. Given the DTD of the input stream, ELF on the fly detects the Pattern Non-Occurrence (PNO) constraints [5] and then utilize such runtime constraint knowledge to adjust the buffering strategy dynamically, thus effectively decreasing the memory consumption.
机译:XML和XQuery [1]已经广泛作为标准数据表示和Web应用程序查询语言接受。当输入由大量XML令牌的,在XML流处理的主存储器缓冲需求可显著,这也可能导致一个显著CPU消耗由于对缓冲的数据操纵成本。为了提供实时响应,内存利用率严重挑战由XQuery求过XML流面对。在许多实际应用中,产生以下的预定义的语义约束,如文档类型定义(DTD)和XML模式[1],通过以下两种情况示出XML流:网络流量监测。用于监测网络流量,网络业务流的异常可能需要从在XML流发送的统计数据来检测。在这种情况下,XML流,这将通过工作流引擎或简单地定制的程序来生成,将遵循预先定义的模式。新闻发布。在这种情况下,消息服务器从大量的来源,如不同记者的设备,不同的广播机构和政府的来源和传播的消息为XML流给用户检索的消息。这些来源可能都同意预定义的模式。利用流输入这样的语义知识使我们能够即时预测约束范围内给定模式的不发生。这有助于避免数据缓冲和在更早的时刻释放缓冲的数据,从而实现最小化的内存占用。让我们考虑一个查询如下:FOR $一个在$ ROOT /新闻WHERE $ A /位置= “波士顿” RETURN <结果> $ A /项,$ A /注释无语义知识,每个绑定新闻元素,最早我们可以上的位置,输出条目和评论执行断言检查,然后释放出相应的缓冲区后的消息已经被完全接收。假设我们得到的消息元素类型的语义,如下DTD:。通过输入流的这种架构知识,如果连续两个广告子元素的新闻元素中相遇,我们可以保证,可以在此消息下不再发生位置。如果没有任何消息中接收到的位置等于“波士顿”,可以从内存和未来即将到来的意见被清除绑定缓冲项可以,因为这个消息是保证没有被直接丢弃,没有任何缓冲。通过利用这种语义知识对查询优化更多实例可以在[5]中找到。最先进的。减少内存消耗是数据流应用非常重要。只有数量有限的XML数据流的处理引擎[3] [6] [4]已经看过了基于模式的查询优化的机会。其中,FluXQuery [4]不支持过滤相关的优化。 [6]和[3]的焦点不在缓冲器最小化。他们只静态捕捉从给定的架构限制的约束。 ELF解决方案。我们建议ELF(约束感知的XQuery引擎以最小的内存占用处理XML数据流)在本演示。给定了输入流的DTD,ELF在飞行检测模式不发生(PNO)约束[5],然后利用这样的运行时约束知识来动态地调整缓冲策略,从而有效地降低了存储器消耗。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号