首页> 外文OA文献 >A fine-grained XML structural comparison approach
【2h】

A fine-grained XML structural comparison approach

机译:细粒度的XML结构比较方法

摘要

As the Web continues to grow and evolve, more and more information is being placed in structurally rich documents, XML documents in particular, so as to improve the efficiency of similarity clustering, information retrieval and data management applications. Various algorithms for comparing hierarchically structured data, e.g., XML documents, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being modeled as ordered labeled trees. Nevertheless, a thorough investigation of current approaches led us to identify several structural similarity aspects, i.e. sub-tree related similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an improved comparison method to deal with fine-grained sub-trees and leaf node repetitions, without increasing overall complexity with respect to current XML comparison methods. Our approach consists of two main algorithms for discovering the structural commonality between sub-trees and computing tree-based edit operations costs. A prototype has been developed to evaluate the optimality and performance of our method. Experimental results, on both real and synthetic XML data, demonstrate better performance with respect to alternative XML comparison methods.
机译:随着Web的不断发展和发展,越来越多的信息被放置在结构丰富的文档(尤其是XML文档)中,以提高相似性聚类,信息检索和数据管理应用程序的效率。在文献中已经提出了用于比较分层结构的数据(例如,XML文档)的各种算法。它们中的大多数都利用查找树结构之间的编辑距离的技术,将XML文档建模为有序标记树。然而,对当前方法的彻底研究使我们确定了几个结构上的相似性方面,即与子树相关的相似性,在比较XML文档时,这些相似性没有得到充分解决。在本文中,我们提供了一种改进的比较方法来处理细粒度的子树和叶节点重复,而不会相对于当前XML比较方法增加整体复杂性。我们的方法由两个主要算法组成,这些算法用于发现子树之间的结构共性和计算基于树的编辑操作成本。已经开发出原型来评估我们方法的最优性和性能。在真实和合成XML数据上的实验结果证明,相对于替代XML比较方法而言,性能更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号