首页> 外文期刊>ACM transactions on Asian language information processing >Word Re-Segmentation in Chinese-Vietnamese Machine Translation
【24h】

Word Re-Segmentation in Chinese-Vietnamese Machine Translation

机译:汉越机器翻译中的单词重分词

获取原文
获取原文并翻译 | 示例
       

摘要

In isolated languages, such as Chinese and Vietnamese, words are not separated by spaces, and a word may be formed by one or more syllables. Therefore, word segmentation (WS) is usually the first process that is implemented in the machine translation process. WS in the source and target languages is based on different training corpora, and WS approaches may not be the same. Therefore, the WS that results in these two languages are not often homologous, and thus word alignment results in many 1-n and n-1 alignment pairs in statistical machine translation, which degrades the performance of machine translation. In this article, we will adjust the WS for both Chinese and Vietnamese in particular and for isolated language pairs in general and make the word boundary of the two languages more symmetric in order to strengthen 1-1 alignments and enhance machine translation performance. We have tested this method on the Computational Linguistics Center’s corpus, which consists of 35,623 sentence pairs. The experimental results show that our method has significantly improved the performance of machine translation compared to the baseline translation system, WS translation system, and anchor language-based WS translation systems.
机译:在诸如中文和越南语的孤立语言中,单词之间不用空格分隔,单词可以由一个或多个音节组成。因此,分词(WS)通常是在机器翻译过程中实现的第一个过程。源语言和目标语言中的WS基于不同的训练语料库,并且WS的方法可能不同。因此,产生这两种语言的WS通常并不同源,因此单词对齐会导致统计机器翻译中出现许多1-n和n-1对齐对,这会降低机器翻译的性能。在本文中,我们将特别针对中文和越南文以及一般针对孤立语言的对WS进行调整,并使这两种语言的单词边界更加对称,以加强1-1对齐并增强机器翻译性能。我们已经在计算语言学中心的语料库中对该方法进行了测试,该语料库包含35,623个句子对。实验结果表明,与基线翻译系统,WS翻译系统和基于锚语言的WS翻译系统相比,我们的方法显着提高了机器翻译的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号