首页> 外文学位 >Parallel XML and XPath Parsing
【24h】

Parallel XML and XPath Parsing

机译:并行XML和XPath解析

获取原文
获取原文并翻译 | 示例

摘要

XML has been widely adopted across a spectrum of applications. Its parsing efficiency, however, remains a concern and can be a bottleneck. XPath is a query language used to locate and select content in an XML document. Improving the performance of XPath processing is thus important for many applications. With the prevalence of multicore CPUs, parallelization to improve performance is one promising approach.;This dissertation investigates the parallelization approaches of DOM-style XML parsing. We first figured out an overall solution to decomposing the XML document into well-formed fragments at well-defined points according to the output of an initial preparsing phase. Then, we focused on the parallelization of the preparsing stage, which is the major bottleneck. Based on earlier research, we extend our work by examining how speculation can be used to improve performance, using an approach we called a p-DFA, not computing low-probability possibilities.;Effectively parallelizing XPath is challenging. For a large number of XPath queries, it is hard to evenly divide them into different processors. However, there are opportunities. First, many queries focus on different location steps, so they can be processed in different processors. Second, it is possible for the free processors to steal jobs from busy ones. The problem is how to maintain the query to be consecutive if it has already executed some location steps. We investigated the use of an approach that builds on YFilter, then divided the NFA into several smaller ones for concurrent processing. We implemented and tested two strategies for load balancing: static approach and dynamic approach with work stealing.;Another research is investigated parallel parsing XPath based on TwigM which focusing on streaming data. According to the state machine created in advance as stated in TwigM algorithm, we created all the needed information from the partial received data. Then discussed how to divide tasks on the fly in two steps, first step is to parse XML and at the same time create tasks, second step is to assign parsed XPath tasks to multiple threads and finally merge the result. The experiments for the above approaches show good speedup and scalability.
机译:XML已在各种应用程序中广泛采用。但是,其解析效率仍然是一个令人担忧的问题,并且可能成为瓶颈。 XPath是一种查询语言,用于查找和选择XML文档中的内容。因此,对于许多应用程序而言,提高XPath处理的性能非常重要。随着多核CPU的普及,提高性能的并行化是一种有前途的方法。本文研究了DOM样式XML解析的并行化方法。我们首先想出一个整体解决方案,根据初始准备阶段的输出,将XML文档在定义明确的点分解为格式良好的片段。然后,我们专注于准备阶段的并行化,这是主要的瓶颈。基于早期的研究,我们通过研究如何使用推测来提高性能来扩展我们的工作,使用一种称为p-DFA的方法,而不是计算低概率的可能性。有效地并行化XPath具有挑战性。对于大量的XPath查询,很难将它们平均分配到不同的处理器中。但是,有机会。首先,许多查询关注于不同的定位步骤,因此可以在不同的处理器中进行处理。其次,免费处理器有可能从繁忙的处理器那里窃取工作。问题是,如果查询已经执行了某些定位步骤,则如何保持查询连续。我们研究了一种基于YFilter的方法的使用,然后将NFA分为几个较小的并发处理。我们实现并测试了两种负载均衡策略:静态方法和带有窃取工作的动态方法。另一项研究是研究基于TwigM的并行解析XPath,其重点是流数据。根据TwigM算法中预先创建的状态机,我们从部分接收到的数据中创建了所有需要的信息。然后讨论了如何分两步快速划分任务,第一步是解析XML,同时创建任务,第二步是将已解析的XPath任务分配给多个线程,最后合并结果。上述方法的实验显示出良好的加速和可扩展性。

著录项

  • 作者

    Zhang, Ying.;

  • 作者单位

    State University of New York at Binghamton.;

  • 授予单位 State University of New York at Binghamton.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 130 p.
  • 总页数 130
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 水产、渔业;
  • 关键词

  • 入库时间 2022-08-17 11:53:01

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号