【24h】

Extraction of Name and Transliteration in Monolingual and Parallel Corpora

机译:单语和平行语料库的名称提取和音译

获取原文
获取原文并翻译 | 示例

摘要

Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates.
机译:自由文本中的命名实体对机器翻译和跨语言信息检索中的文本分析提出了挑战。这些词组通常会被音译为具有不同声音清单和书写系统的另一种语言。在自由文本中找到的命名实体通常不在双语词典中列出。尽管可以在不使用专有名称和音译列表的情况下即时识别和翻译命名实体,但现有音译的广泛列表肯定会确保较高的准确率。我们使用专有名称和音译的种子列表来训练机器音译模型。使用该模型,可以以较高的准确性和召回率在单语或平行语料库中提取专有名称及其音译。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号