首页> 外文期刊>Information Systems >Tree edit distance: Robust and memory-efficient
【24h】

Tree edit distance: Robust and memory-efficient

机译:树的编辑距离:健壮且高效存储

获取原文
获取原文并翻译 | 示例
           

摘要

Hierarchical data are often modelled as trees. An interesting query identifies pairs of similar trees. The standard approach to tree similarity is the tree edit distance, which has successfully been applied in a wide range of applications. In terms of runtime, the state-of-the-art algorithm for the tree edit distance is RTED, which is guaranteed to be fast independent of the tree shape. Unfortunately, this algorithm requires up to twice the memory of its competitors. The memory is quadratic in the tree size and is a bottleneck for the tree edit distance computation. In this paper we present a new, memory efficient algorithm for the tree edit distance, AP-TED (All Path Tree Edit Distance). Our algorithm runs at least as fast as RTED without trading in memory efficiency. This is achieved by releasing memory early during the first step of the algorithm, which computes a decomposition strategy for the actual distance computation. We show the correctness of our approach and prove an upper bound for the memory usage. The strategy computed by AP-TED is optimal in the class of all-path strategies, which subsumes the class of LRH strategies used in RTED. We further present the AP-TED+ algorithm, which requires less computational effort for very small subtrees and improves the runtime of the distance computation. Our experimental evaluation confirms the low memory requirements and the runtime efficiency of our approach. (C) 2015 Elsevier Ltd. All rights reserved.
机译:分层数据通常被建模为树。一个有趣的查询可以识别成对的相似树。树相似度的标准方法是树编辑距离,树编辑距离已成功应用于各种应用中。就运行时间而言,树编辑距离的最新算法为RTED,该算法可确保快速独立于树的形状。不幸的是,该算法需要的存储量是其竞争对手的两倍。存储器在树的大小上是平方的,并且是树编辑距离计算的瓶颈。在本文中,我们提出了一种新的,内存有效的树编辑距离算法AP-TED(全路径树编辑距离)。我们的算法的运行速度至少与RTED一样快,而不会牺牲内存效率。这是通过在算法的第一步中提早释放内存来实现的,该算法为实际距离计算计算分解策略。我们展示了我们方法的正确性,并证明了内存使用的上限。 AP-TED计算的策略在全路径策略类别中是最佳的,它包含了RTED中使用的LRH策略类别。我们进一步提出了AP-TED +算法,该算法对于非常小的子树需要较少的计算工作,并改善了距离计算的运行时间。我们的实验评估证实了我们方法的低内存需求和运行时效率。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号