首页> 外文期刊>Information Systems >An efficient similarity-based approach for comparing XML documents
【24h】

An efficient similarity-based approach for comparing XML documents

机译:一种有效的基于相似度的XML文档比较方法

获取原文
获取原文并翻译 | 示例
       

摘要

XML documents are widely used to interchange information among heterogeneous systems, ranging from office applications to scientific experiments. Independently of the domain, XML documents may evolve, so identifying and understanding the changes they undergo becomes crucial. Some syntactic diff approaches have been proposed to address this problem. They are mainly designed to compare revisions of XML documents using explicit IDs to match elements. However, elements in different revisions may not share IDs due to tool incompatibility or even divergent or missing schemas. In this paper, we present Phoenix, a similarity-based approach for comparing revisions of XML documents that does not rely on explicit IDs. Phoenix uses dynamic programming and optimization algorithms to compare different features (e.g., element name, content, attributes, and sub-elements) of XML documents and calculate the similarity degree between them. We compared Phoenix with X-Diff and XyDiff, two state-of-the-art XML diff algorithms. XyDiff was the fastest approach but failed in providing precise matching results. X-Diff presented higher efficacy in 30 of the 56 scenarios but was slow. Phoenix executed in a fraction of the running time required by X-Diff and achieved the best results in terms of efficacy in 26 of 56 tested scenarios. In our evaluations, Phoenix was by far the most efficient approach to match elements across revisions of the same XML document. (C) 2018 Elsevier Ltd. All rights reserved.
机译:XML文档被广泛用于在异构系统之间交换信息,从办公应用程序到科学实验,不一而足。 XML文档可以独立于域而发展,因此识别和理解它们所经历的更改变得至关重要。已经提出了一些语法差异方法来解决该问题。它们主要用于比较使用显式ID匹配元素的XML文档的修订。但是,由于工具不兼容甚至架构不同或缺失,不同版本中的元素可能不会共享ID。在本文中,我们提出了Phoenix,一种基于相似度的方法,用于比较不依赖显式ID的XML文档的修订。 Phoenix使用动态编程和优化算法来比较XML文档的不同功能(例如元素名称,内容,属性和子元素),并计算它们之间的相似度。我们将Phoenix与X-Diff和XyDiff(两种最新的XML diff算法)进行了比较。 XyDiff是最快的方法,但未能提供精确的匹配结果。 X-Diff在56种情况中的30种中表现出较高的疗效,但效果缓慢。在56个测试场景中的26个场景中,Phoenix执行了X-Diff所需运行时间的一小部分,并在功效方面取得了最佳结果。在我们的评估中,Phoenix是迄今为止在同一XML文档的各个修订版本之间匹配元素的最有效方法。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号