【24h】

The q-Gram Distance for Ordered Unlabeled Trees

机译:有序无标签树的q-Gram距离

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we investigate the q-Gram distance for ordered unlabeled trees (trees, for short). First, we formulate a q-gram as simply a tree with q nodes isomorphic to a line graph, and the q-gram distance between two trees as similar as one between two strings. Then, by using the depth sequence based on postorder, we design the algorithm EnumGram to enumerate all q-grams in a tree T with n nodes which runs in O(n~2) time and in O(q) space. Furthermore, we improve it to the algorithm LinearEnumGram which runs in O(qn) time and in O(qd) space, where d is the depth of T. Hence, we can evaluate the q-gram distance D_q(T_1,T_2) between T_1 and T_2 in O(q max {n_1, n_2}) time and in O(q max {d_1, d_2}) space, where n_i and d_i are the number of nodes in T_i and the depth of T_i, respectively. Finally, we show the relationship between the q-gram distance D_q(T_1, T_2) and the edit distance E(T_1, T_2) that D_q(T_1, T_2) ≤ (gl + 1) E(T_1, T_2), where g = max{g_1, g_2}, l = max{l_1, l_2}, g_i is the degree of T_i and l_i is the number of leaves in T_i. In particular, for the top-down tree edit distance F(T_1, T_2), this result implies that D_q(T_1, T_2) ≤ min{g~(q-2), l - 1} F(T_1,T_2).
机译:在本文中,我们研究了有序未标记树(简称树)的q-Gram距离。首先,我们将q-gram简化为一棵树,其中q个节点与折线图同构,并且两棵树之间的q-gram距离类似于两个字符串之间的距离。然后,通过使用基于后置顺序的深度序列,我们设计了EnumGram算法来枚举具有T个树T的所有q-gram,其中n个节点在O(n〜2)时间和O(q)空间中运行。此外,我们将其改进为在O(qn)时间和O(qd)空间中运行的LinearEnumGram算法,其中d是T的深度。因此,我们可以评估之间的q-gram距离D_q(T_1,T_2)在O(q max {n_1,n_2})时间和O(q max {d_1,d_2})空间中的T_1和T_2,其中n_i和d_i分别是T_i中的节点数和T_i的深度。最后,我们显示q-gram距离D_q(T_1,T_2)和编辑距离E(T_1,T_2)之间的关系,其中D_q(T_1,T_2)≤(gl + 1)E(T_1,T_2),其中g = max {g_1,g_2},l = max {l_1,l_2},g_i是T_i的度数,l_i是T_i中的叶数。特别是,对于自上而下的树编辑距离F(T_1,T_2),此结果意味着D_q(T_1,T_2)≤min {g〜(q-2),-1} F(T_1,T_2)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号