首页> 外文会议>SLSP 2013 >MDL-Based Models for Transliteration Generation
【24h】

MDL-Based Models for Transliteration Generation

机译:基于MDL的音译模型

获取原文

摘要

This paper presents models for automatic transliteration of proper names between languages that use different alphabets. The models are an extension of our work on automatic discovery of patterns of etymological sound change, based on the Minimum Description Length Principle. The models for pairwise alignment are extended with algorithms for prediction that produce transliterated names. We present results on 13 parallel corpora for 7 languages, including English, Russian, and Farsi, extracted from Wikipedia headlines. The transliteration corpora are released for public use. The models achieve up to 88% on word-level accuracy and up to 99% on symbol-level F-score. We discuss the results from several perspectives, and analyze how corpus size, the language pair, the type of names (persons, locations), and noise in the data affect the performance.
机译:本文介绍了使用不同字母的语言之间适当名称的自动音译模型。根据最小描述长度原理,该模型是我们在自动发现导演模式变化模式的工作的延伸。成对对齐的模型与用于产生音译名称的预测的算法扩展。我们展示了13个平行语料库的结果7种语言,包括从维基百科标题提取的英语,俄罗斯和波斯语。音译基层被释放用于公共使用。符号级别的F分数达到88%的型号达到88%,高达99%。我们从几种角度讨论结果,分析了语料库大小,语言对,名称类型(人员,位置)和数据中的噪声影响性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号