首页> 外文会议>International Conference on Database Systems for Advanced Applications(DASFAA 2005); 20050417-20; Beijing(CN) >Efficient Evaluation of Partial Match Queries for XML Documents Using Information Retrieval Techniques
【24h】

Efficient Evaluation of Partial Match Queries for XML Documents Using Information Retrieval Techniques

机译:使用信息检索技术对XML文档的部分匹配查询进行有效评估

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

We propose XIR, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval (IR) techniques. A partial match query is defined as the one having the descendent-or-self axis "//" in its path expression. In its general form, a partial match query has branch predicates forming branching paths. The objective of XIR is to efficiently support this type of queries for large-scale documents of heterogeneous schemas. XIR has its basis on the conventional schema-level methods using relational tables and significantly improves their efficiency using two techniques: an inverted index technique and a novel prefix match join. The former indexes the labels in label paths as keywords in texts, and allows for finding the label paths matching the queries more efficiently than string match used in the conventional methods. The latter supports branching path expressions, and allows for finding the result nodes more efficiently than containment joins used in the conventional methods. We compare the efficiency of XIR with those of XRel and XParent using XML documents crawled from the Internet. The results show that XIR is more efficient than both XRel and XParent by several orders of magnitude for linear path expressions, and by several factors for branching path expressions.
机译:我们提出了XIR,这是一种使用信息检索(IR)技术处理异构XML文档上的部分匹配查询的新颖方法。部分匹配查询定义为在其路径表达式中具有后代或自身轴“ //”的查询。以其一般形式,部分匹配查询具有形成分支路径的分支谓词。 XIR的目标是为异构模式的大规模文档有效支持这种类型的查询。 XIR的基础是使用关系表的传统模式级方法,并使用两种技术(倒排索引技术和新颖的前缀匹配联接)显着提高了效率。前者将标签路径中的标签索引为文本中的关键字,并且比传统方法中使用的字符串匹配更有效地查找与查询匹配的标签路径。后者支持分支路径表达式,并且比传统方法中使用的包含联接更有效地查找结果节点。我们使用从Internet爬网的XML文档,将XIR的效率与XRel和XParent的效率进行了比较。结果表明,对于线性路径表达式,XIR比XRel和XParent效率高几个数量级,对于分支路径表达式,XIR比XParent效率更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号