首页> 外文期刊>Data & Knowledge Engineering >Fragment-based approximate retrieval in highly heterogeneous XML collections
【24h】

Fragment-based approximate retrieval in highly heterogeneous XML collections

机译:高度异构的XML集合中基于片段的近似检索

获取原文
获取原文并翻译 | 示例
           

摘要

Due to the heterogeneous nature of XML data for internet applications exact matching of queries is often inadequate. The need arises to quickly identify subtrees of XML documents in a collection that are similar to a given pattern. Similarity involves both tags, that are not required to coincide, and structure, in which not all the relationships among nodes in the tree structure are strictly preserved. In this paper we present an efficient approach to the identification of similar subtrees, relying on ad-hoc indexing structures. The approach allows to quickly detect, in a heterogeneous document collection, the minimal portions that exhibit some similarity with the pattern. These candidate portions are then ranked according to their actual similarity. The approach supports different notions of similarity, thus it can be customized to different application domains. In the paper, three different similarity measures are proposed and compared. The approach is experimentally validated and the experimental results are extensively discussed.
机译:由于用于Internet应用程序的XML数据的异构性质,查询的精确匹配通常是不够的。需要快速识别集合中与给定模式相似的XML文档的子树。相似性涉及两个标签(不需要重合)和结构(其中树结构中的节点之间的所有关系都不严格保留)。在本文中,我们依靠ad-hoc索引结构提出了一种有效的方法来识别相似的子树。该方法允许在异类文档集合中快速检测出与模式具有某些相似性的最小部分。然后根据这些候选部分的实际相似性对其进行排名。该方法支持相似性的不同概念,因此可以针对不同的应用程序域进行自定义。本文提出并比较了三种不同的相似性度量。该方法经过实验验证,并广泛讨论了实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号