首页> 外文会议>International Colloquium on Grammatical Inference >A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer
【24h】

A Discriminative Model of Stochastic Edit Distance in the Form of a Conditional Transducer

机译:条件换能器形式的随机编辑距离的辨别模型

获取原文

摘要

Many real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independent from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of context-sensitive edit distances.
机译:许多现实世界的应用,如法术检查或DNA分析使用Levenshtein编辑距离来计算字符串之间的相似之处。在实践中,通常可以手动调整原始编辑操作(插入,删除和符号)的成本。在本文中,我们提出了一种学习这些成本的算法。底层模型是概率换能器,通过使用语法推理技术计算,允许我们学习模型的结构和概率。除了在标准术语中,学习的传感器既不确定性也不是统计的,它们是有条件的,因此独立于输入字符串的分布。最后,我们通过实验显示我们的方法允许我们设计成本函数,这取决于使用编辑操作的字符串上下文。换句话说,我们获得各种上下文敏感的编辑距离。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号