Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams

机译：跨脚本的名称相似性：使用N语法的学习成本来编辑距离

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Any cross-language processing application has to first tackle the problem of transliteration when facing a language using another script. The first solution consists of using existing transliteration tools, but these tools are not usually suitable for all purposes. For some specific script pairs they do not even exist. Our aim is to discriminate transliterations across different scripts in a unified way using a learning method that builds a transliteration model out of a set of transliterated proper names. We compare two strings using an algorithm that builds a Levenshtein edit distance using n-grams costs. The evaluations carried out show that our similarity measure is accurate.

机译：当使用另一种脚本面对一种语言时，任何跨语言处理应用程序都必须首先解决音译问题。第一个解决方案包括使用现有的音译工具，但是这些工具通常并不适合所有目的。对于某些特定的脚本对，它们甚至不存在。我们的目标是使用一种学习方法，以一种统一的方式来区分不同脚本之间的音译，该学习方法是根据一组音译专有名称构建音译模型的。我们使用一种算法比较两个字符串，该算法使用n克成本构建Levenshtein编辑距离。进行的评估表明，我们的相似性度量是准确的。

著录项

来源
《Advances in Natural Language Processing》|2008年|P.405-416|共12页
会议地点 Gothenburg(SE);Gothenburg(SE)
作者
Bruno Pouliquen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词
transliteration; string similarity;

机译：音译;字符串相似度;

相似文献

外文文献
中文文献
专利

1. Log Posterior Approach in Learning Rules Generated using N-Gram based Edit distance for Keyword Search [J] . M.Priya, R.Kalpana Journal of Intelligent Systems . 2018,第4期

机译：在使用基于n-gram的编辑距离生成的学习规则中的日志后方法进行关键字搜索
2. Concept Integration using Edit Distance and N-Gram Match [J] . Vikram Singh, Pradeep Joshi, Shakti Mandhan International Journal of Database Management Systems . 2014,第6期

机译：使用编辑距离和N-Gram匹配进行概念整合
3. Learning the Edit Costs of Graph Edit Distance Applied to Ligand-Based Virtual Screening [J] . Garcia-Hernandez Carlos, Fernandez Alberto, Serratosa Francesc Current topics in medicinal chemistry . 2020,第18期

机译：学习图表编辑距离的编辑成本，应用于基于Ligand的虚拟筛选
4. Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams [C] . Bruno Pouliquen International Conference on Advances in Natural Language Processing . 2008

机译：跨脚本的名称的相似性：使用n-grams的学习成本编辑距离
5. Fast Edit Distance Calculation Methods for NGS Sequence Similarity [D] . Islam, A. K. M. Tauhidul. 2020

机译：NGS序列相似性快速编辑距离计算方法
6. Statistical Analysis of the Indus Script Using n-Grams [O] . Nisha Yadav, Hrishikesh Joglekar, Rajesh P. N. Rao, 2010

机译：使用n语法对印度语脚本进行统计分析
7. N-gram similarity and distance [O] . Grzegorz Kondrak 2005

机译：N-gram相似度和距离

Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams

摘要

著录项

相似文献

相关主题

期刊订阅