首页> 中文期刊>计算机与现代化 >一种支持通配符查询的XML模式匹配算法

一种支持通配符查询的XML模式匹配算法

     

摘要

XML查询语言当中,包含通配符倡的查询能够方便有效地满足一些特殊查询要求,但在大数据时代下XML文件容量与结构复杂性不断增加,现有支持通配符查询的算法需消耗巨量内存来解析XML,并且在对嵌套通配符处理时需要大量的单路径匹配操作和局部结果的缓存。针对此现状,结合现有经典算法,提出一种新的、能够高效解决小枝模式当中含有通配符倡的查询算法—WTwigList。该算法首先对查询模式进行通配符的层次关系处理,减少不必要的通配符匹配,以数据流形式解析XML文件并执行局部的扩展Dewey编码,经过滤操作后得到有序的叶子节点编码列表,在列表中执行匹配操作得到结果;其次在真实和合成数据集上做大量实验,结果表明WTwigList算法与现有算法相比,能够有效提高查询效率,在空间效率上具有一定优势,且能够快速准确地处理查询模式中P-C关系。%In XML query language, the wildcard query which includes “*” can effectively meet some special query require-ments.But in the big data era, with the increasing of the XML file size and structural complexity, the existing algorithms which support wildcard query need huge amounts of memory to parse XML file and also need many single path matching operations and local result caching.Aiming at this situation, we propose a new XML pattern matching algorithm named WTwigList to solve the twig pattern containing the wildcard effectively.First, the hierarchical relationship of wildcard in the query pattern is processed to reduce unnecessary wildcard matching.Then the XML file is parsed as data stream pattern and the local Extended Dewey enco-ding is executed.After filtering operation, the ordered list of leaf node encoding is gotten, and the matching results can get from the list matching operations.A set of experimental result on both real-life and synthetic dataset demonstrates that WTwigList im-proves query efficiency andis of advantages in space efficiency, and it can deal with the P-C relationship quickly and accurately.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号