首页> 外文期刊>The VLDB Journal >Structural consistency: enabling XML keyword search to eliminate spurious results consistently
【24h】

Structural consistency: enabling XML keyword search to eliminate spurious results consistently

机译:结构一致性:启用XML关键字搜索以始终消除虚假结果

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

XML keyword search is a user-friendly way to query XML data using only keywords. In XML keyword search, to achieve high precision without sacrificing recall, it is important to remove spurious results not intended by the user. Efforts to eliminate spurious results have enjoyed some success using the concepts of LCA or its variants, SLCA and MLCA. However, existing methods still could find many spurious results. The fundamental cause for the occurrence of spurious results is that the existing methods try to eliminate spurious results locally without global examination of all the query results and, accordingly, some spurious results are not consistently eliminated. In this paper, we propose a novel keyword search method that removes spurious results consistently by exploiting the new concept of structural consistency. We define structural consistency as a property that is preserved if there is no query result having an ancestor-descendant relationship at the schema level with any other query results. A naive solution to obtain structural consistency would be to compute all the LCAs (or variants) and then to remove spurious results according to structural consistency. Obviously, this approach would always be slower than existing LCA-based ones. To speed up structural consistency checking, we must be able to examine the query results at the schema level without generating all the LCAs. However, this is a challenging problem since the schema-level query results do not homomorphically map to the instance-level query results, causing serious false dismissal. We present a comprehensive and practical solution to this problem and formally prove that this solution preserves structural consistency at the schema level without incurring false dismissal. We also propose a relevance-feedback-based solution for the problem where our method has low recall, which occurs when it is not the user’s intention to find more specific results. This solution has been prototyped in a full-fledged object-relational DBMS Odysseus developed at KAIST. Experimental results using real and synthetic data sets show that, compared with the state-of-the-art methods, our solution significantly (1) improves precision while providing comparable recall for most queries and (2) enhances the query performance by removing spurious results early.
机译:XML关键字搜索是一种仅使用关键字来查询XML数据的用户友好方法。在XML关键字搜索中,要在不牺牲召回率的情况下实现高精度,重要的是要删除用户不希望得到的虚假结果。使用LCA或其变体SLCA和MLCA的概念,消除杂散结果的努力取得了一些成功。但是,现有方法仍可能会发现许多虚假结果。产生虚假结果的根本原因是,现有方法试图在不全局检查所有查询结果的情况下局部消除虚假结果,因此,某些虚假结果并未得到一致消除。在本文中,我们提出了一种新颖的关键字搜索方法,该方法通过利用结构一致性的新概念来一致地删除虚假结果。我们将结构一致性定义为一个属性,如果在模式级别不存在与其他任何查询结果有祖先后代关系的查询结果,则保留该属性。获得结构一致性的幼稚解决方案将是计算所有LCA(或变体),然后根据结构一致性去除虚假结果。显然,这种方法总是比现有的基于LCA的方法慢。为了加快结构一致性检查,我们必须能够在模式级别检查查询结果,而无需生成所有LCA。但是,这是一个具有挑战性的问题,因为架构级别的查询结果不会同态映射到实例级别的查询结果,从而导致严重的错误解雇。我们为这个问题提供了一个全面而实用的解决方案,并正式证明了该解决方案在模式级别上保持了结构一致性,而不会引起错误的解雇。对于我们的方法召回率较低的问题,我们也提出了一种基于相关性反馈的解决方案,这种情况发生在用户不希望找到更具体的结果时。该解决方案已在KAIST开发的成熟的对象关系DBMS Odysseus中进行了原型设计。使用实际和综合数据集的实验结果表明,与最新方法相比,我们的解决方案显着(1)提高了精度,同时为大多数查询提供了可比的查全率;(2)通过消除虚假结果来提高查询性能早。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号