【24h】

MDL-Based Models for Transliteration Generation

机译:基于MDL的音译生成模型

获取原文

摘要

This paper presents models for automatic transliteration of proper names between languages that use different alphabets. The models are an extension of our work on automatic discovery of patterns of etymological sound change, based on the Minimum Description Length Principle. The models for pairwise alignment are extended with algorithms for prediction that produce transliterated names. We present results on 13 parallel corpora for 7 languages, including English, Russian, and Farsi, extracted from Wikipedia headlines. The transliteration corpora are released for public use. The models achieve up to 88% on word-level accuracy and up to 99% on symbol-level F-score. We discuss the results from several perspectives, and analyze how corpus size, the language pair, the type of names (persons, locations), and noise in the data affect the performance.
机译:本文介绍了在使用不同字母的语言之间自动音译专有名词的模型。这些模型是根据最小描述长度原理在自动发现词源音变化模式方面的工作的扩展。用于成对对齐的模型使用生成音译名称的预测算法进行了扩展。我们提供了从Wikipedia头条摘录的7种语言(包括英语,俄语和波斯语)的13个并行语料库的结果。音译语料库已发布以供公众使用。这些模型在单词级别的准确性上达到了88%,在符号级别的F评分上达到了99%。我们从多个角度讨论了结果,并分析了语料库大小,语言对,姓名(人员,位置)的类型以及数据中的噪音如何影响性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号