【24h】

An Effective and Efficient Approach for Keyword-Based XML Retrieval

机译:一种有效的基于关键字的XML检索方法

获取原文
获取原文并翻译 | 示例

摘要

IR-style keyword-based search on XML document has become the most common tool for XML query, as users need not to know the structural information of the target XML document before constructing a query. For a keyword-based search engine for XML document, the key issue is how to return some sets of meaningfully related nodes to user's query efficiently. An ordinary solution of current approaches is to store the relationship of each pair of nodes in an XML document to an index. Obviously, this will lead to serious storage overhead. In this paper, we propose an enhanced inverted index structure (PN-Inverted Index) that stores path information in addition to node ID, and import and extend the concept of LCA to PLCA. Efficient algorithms with these concepts are designed to check the relationship of arbitrary number of nodes. Compared with existing approaches, our approach need not create additional relationship index but just utilize the existing inverted index that is much common for IR-style keyword search engine. Experimental results show that with the promise of returning meaningful answers, our search engine offers great performance benefits. Although the size of the inverted index is increased, the total size of indices of search engine is smaller than the existing approaches.
机译:基于IR样式的基于关键字的XML文档搜索已成为XML查询的最常用工具,因为用户在构造查询之前无需了解目标XML文档的结构信息。对于XML文档的基于关键字的搜索引擎,关键问题是如何有效地将一些有意义的相关节点集返回给用户查询。当前方法的普通解决方案是将XML文档中每对节点的关系存储到索引。显然,这将导致严重的存储开销。在本文中,我们提出了一种增强的反向索引结构(PN反向索引),该结构除了存储节点ID之外还存储路径信息,并将LCA的概念导入并扩展到PLCA。具有这些概念的高效算法旨在检查任意数量节点的关系。与现有方法相比,我们的方法无需创建其他关系索引,而只需利用IR样式关键字搜索引擎非常普遍的现有反向索引。实验结果表明,有了返回有意义答案的希望,我们的搜索引擎将带来巨大的性能优势。尽管倒排索引的大小增加了,但是搜索引擎索引的总大小小于现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号