首页> 外文OA文献 >Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation
【2h】

Combining linguistics and statistics for high-quality limited domain English-Chinese machine translation

机译:结合语言学和统计学,为高质量的有限域英汉机器翻译

摘要

Second language learning is a compelling activity in today's global markets. This thesis focuses on critical technology necessary to produce a computer spoken translation game for learning Mandarin Chinese in a relatively broad travel domain. Three main aspects are addressed: efficient Chinese parsing, high-quality English-Chinese machine translation, and how these technologies can be integrated into a translation game system. In the language understanding component, the TINA parser is enhanced with bottom-up and long distance constraint features. The results showed that with these features, the Chinese grammar ran ten times faster and covered 15% more of the test set. In the machine translation component, a combined method of linguistic and statistical system is introduced. The English-Chinese translation is done via an intermediate language "Zhonglish", where the English-Zhonglish translation is accomplished by a parse-and-paraphrase paradigm using hand-coded rules, mainly for structural reconstruction. Zhonglish-Chinese translation is accomplished by a standard phrase based statistical machine translation system, mostly accomplishing word sense disambiguation and lexicon mapping. We evaluated in an independent test set in IWSLT travel domain spoken language corpus. Substantial improvements were achieved for GIZA alignment crossover: we obtained a 45% decrease in crossovers compared to a traditional phrase-based statistical MT system. Furthermore, the BLEU score improved by 2 points. Finally, a framework of the translation game system is described, and the feasibility of integrating the components to produce reference translation and to automatically assess student's translation is verified.
机译:在当今的全球市场中,第二语言学习是一项引人注目的活动。本文着眼于关键技术,该技术是生产计算机口语翻译游戏所必需的,以在相对广泛的旅行领域中学习普通话。解决了三个主要方面:高效的中文解析,高质量的英汉机器翻译以及如何将这些技术集成到翻译游戏系统中。在语言理解组件中,TINA解析器通过自底向上和长距离约束功能得到了增强。结果表明,有了这些功能,中文语法的运行速度提高了十倍,并且覆盖了测试集的15%以上。在机器翻译组件中,引入了一种语言统计系统的组合方法。英文翻译是通过中间语言“中式”完成的,其中中英文的翻译是通过使用手动编码规则的解析和释义范例来完成的,主要用于结构重建。中英文翻译是通过基于标准短语的统计机器翻译系统完成的,主要完成词义消歧和词典映射。我们在IWSLT旅行域口语语料库中的独立测试集中进行了评估。 GIZA对齐交叉获得了实质性的改善:与传统的基于短语的统计MT系统相比,我们的交叉减少了45%。此外,BLEU得分提高了2分。最后,描述了翻译游戏系统的框架,并验证了集成这些组件以生成参考翻译并自动评估学生翻译的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号