...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Learning probabilistic models of tree edit distance
【24h】

Learning probabilistic models of tree edit distance

机译:学习树的概率模型编辑距离

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Nowadays, there is a growing interest in machine learning and pattern recognition for tree-structured data. Trees actually provide a suitable structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, computer music, or conversion of semi-structured data (e.g. XML documents). Many applications in these domains require the calculation of similarities over pairs of trees. In this context, the tree edit distance (ED) has been subject of investigations for many years in order to improve its computational efficiency. However, used in its classical form, the tree ED needs a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, to overcome this drawback, we focus on the automatic learning of a non-parametric stochastic tree ED. More precisely, we are interested in two kinds of probabilistic approaches. The first one builds a generative model of the tree ED from a joint distribution over the edit operations, while the second works from a conditional distribution providing then a discriminative model. To tackle these tasks, we present an adaptation of the expectation-maxin-tization algorithm for leaming these distributions over the primitive edit costs. Two experiments are conducted. The first is achieved on artificial data and confirms the interest to learn a tree ED rather than a priori imposing edit costs; The second is applied to a pattern recognition task aiming to classify handwritten digits. (c) 2008 Elsevier Ltd. All rights reserved.
机译:如今,人们对树结构数据的机器学习和模式识别越来越感兴趣。树实际上提供了合适的结构表示来处理复杂的任务,例如网络信息提取,RNA二级结构预测,计算机音乐或半结构化数据(例如XML文档)的转换。这些领域中的许多应用都需要计算树对之间的相似度。在这种情况下,树编辑距离(ED)一直是研究的主题,以提高其计算效率。但是,以其经典形式使用时,树ED需要先验的固定编辑成本,而这些成本通常很难调整,因此解决复杂问题的空间很小。在本文中,为了克服这个缺点,我们集中于非参数随机树ED的自动学习。更准确地说,我们对两种概率方法感兴趣。第一个从编辑操作的联合分布构建树ED的生成模型,而第二个从有条件的分布工作,然后提供判别模型。为了解决这些任务,我们提出了一种期望最大化算法的改编,用于在原始编辑成本上获取这些分布。进行了两个实验。首先是在人工数据上实现的,它确定了学习树ED的兴趣,而不是先验地增加了编辑费用;第二个应用于模式识别任务,旨在对手写数字进行分类。 (c)2008 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号