...
【24h】

Learning state machine-based string edit kernels

机译:学习基于状态机的字符串编辑内核

获取原文
获取原文并翻译 | 示例
           

摘要

During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden Markov model) and compares two strings according to how they are generated by M. On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing conditional probabilities. In this paper, we adapt this approach to edit distance-based conditional distributions and we present a way to learn a new string edit kernel. We show that the practical computation of such a kernel between two strings x and x' built from an alphabet requires (i) to learn edit probabilities in the form of the parameters of a stochastic state machine and (ii) to calculate an infinite sum over Sigma* by resorting to the intersection of probabilistic automata as done for rational kernels. We show on a handwritten character recognition task that our new kernel outperforms not only the state of the art string kernels and string edit kernels but also the standard edit distance used by a neighborhood-based classifier.
机译:在过去的几年中,已经完成了几项从概率分布中导出字符串核的工作。例如,费舍尔核使用生成模型M(例如,隐马尔可夫模型),并根据M生成它们的方式比较两个字符串。另一方面,边缘化核允许通过计算两个实例之间的联合相似性对条件概率求和。在本文中,我们将这种方法应用于基于距离的条件分布的编辑,并提出了一种学习新的字符串编辑内核的方法。我们表明,对这样一个由两个字母组成的字符串x和x'之间的内核进行实际计算需要(i)以随机状态机的参数形式学习编辑概率,以及(ii)计算Sigma *通过诉诸概率自动机的交集,就像对有理核所做的那样。我们在手写字符识别任务上显示出,我们的新内核不仅性能优于现有的字符串内核和字符串编辑内核,而且还优于基于邻域的分类器使用的标准编辑距离。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号