首页> 外文期刊>IEICE Transactions on Information and Systems >Translation of Untranslatable Words - Integration of Lexical Approximation and Phrase-Table Extension Techniques into Statistical Machine Translation
【24h】

Translation of Untranslatable Words - Integration of Lexical Approximation and Phrase-Table Extension Techniques into Statistical Machine Translation

机译:不可译词的翻译-将词法近似和短语表扩展技术集成到统计机器翻译中

获取原文
获取原文并翻译 | 示例
       

摘要

This paper proposes a method for handling out-of-vocabulary (OOV) words that eannot be translated using eonventional phrase-based statistieal machine translation (SMT) systems. For a given OOV word, lexical approximation techniques are utilized to identify spellirg and inflectional werd variants that occur in the training data. All OOV words in the soures sentence are then replaced with appropriate word variants found in the training corpus, thus reducing the number of OOV words in the input. Moreover, in order to increase the coverage of such word translations, the SMT translation model is extended by adding new phrase translations for all source language words that do not have a single-word entry in the original phrase-table but only appear in the context of larger phrases. The effectiveness of the proposed methods is investigated for the translation of Hindi to English. Chinese, and Japanese.
机译:本文提出了一种方法,用于处理使用基于常规短语的统计机器翻译(SMT)系统无法翻译的词汇外(OOV)单词。对于给定的OOV单词,可以使用词汇近似技术来识别训练数据中出现的拼写和词尾变化的变体。然后,用在训练语料库中找到的适当单词变体替换源句中的所有OOV单词,从而减少输入中的OOV单词的数量。此外,为了增加此类单词翻译的覆盖范围,通过为所有在原始短语表中没有单个单词条目但仅在上下文中出现的源语言单词添加新的短语翻译来扩展SMT翻译模型较大的短语。研究了所提方法对印地语到英语翻译的有效性。中文和日文。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号