首页> 外文会议>Grammatical Inference: Algorithms and Applications; Lecture Notes in Artificial Intelligence; 4201 >A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer
【24h】

A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer

机译:有条件换能器形式的随机编辑距离判别模型

获取原文
获取原文并翻译 | 示例

摘要

Many real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independent from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of context-sensitive edit distances.
机译:拼写检查或DNA分析等许多实际应用程序都使用Levenshtein编辑距离来计算字符串之间的相似度。实际上,原始编辑操作(插入,删除和替换符号)的成本通常是手动调整的。在本文中,我们提出了一种算法来学习这些成本。基础模型是一个概率转换子,它是通过使用语法推断技术计算出来的,它使我们能够学习模型的结构和概率。除了学习的传感器在标准术语中既不是确定性的也不是随机的这一事实之外,它们是有条件的,因此与输入字符串的分布无关。最后,我们通过实验表明,我们的方法允许我们设计成本函数,该函数取决于使用编辑操作的字符串上下文。换句话说,我们得到了各种上下文相关的编辑距离。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号