首页> 外文会议>Conference on empirical methods in natural language processing >Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages
【24h】

Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

机译:对低资源语言的约束发现进行引导音译

获取原文

摘要

Generating the English transliteration of a name written in a foreign script is an important and challenging step in multilingual knowledge acquisition and information extraction. Existing approaches to transliteration generation require a large (>5000) number of training examples. This difficulty contrasts with transliteration discovery, a somewhat easier task that involves picking a plausible transliteration from a given list. In this work, we present a bootstrapping algorithm that uses constrained discovery to improve generation, and can be used with as few as 500 training examples, which we show can be sourced from annotators in a matter of hours. This opens the task to languages for which large number of training examples are unavailable. We evaluate transliteration generation performance itself, as well the improvement it brings to cross-lingual candidate generation for entity linking, a typical downstream task. We present a comprehensive evaluation of our approach on nine languages, each written in a unique script.
机译:在多语言知识获取和信息提取中,生成用外国文字写成的名字的英文音译是重要且具有挑战性的一步。现有的音译生成方法需要大量(> 5000)的训练示例。这个困难与音译发现相反,音译发现是一项较为简单的任务,涉及从给定列表中选择合理的音译。在这项工作中,我们提出了一种自举算法,该算法使用约束发现来改善生成,并且可以与多达500个训练示例一起使用,我们展示了可以在数小时内从注释者那里获得训练示例。这将打开针对无法提供大量培训示例的语言的任务。我们评估音译生成性能本身,以及它为跨语言候选生成(用于实体链接)带来的改进,这是典型的下游任务。我们对使用九种语言的每种方法进行了全面评估,每种语言均以独特的脚本编写。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号