首页> 外文期刊>ACM transactions on Asian language information processing >Linguistic-Relationships-Based Approach for Improving Word Alignment

Linguistic-Relationships-Based Approach for Improving Word Alignment


获取原文并翻译 | 示例


The unsupervised word alignments (such as GIZA++) are widely used in the phrase-based statistical machine translation. The quality of the model is proportional to the size and the quality of the bilingual corpus. However, for low-resource language pairs such as Chinese and Vietnamese, a result of unsupervised word alignment sometimes is of low quality due to the sparse data. In addition, this model does not take advantage of the linguistic relationships to improve performance of word alignment. Chinese and Vietnamese have the same language type and have close linguistic relationships. In this article, we integrate the characteristics of linguistic relationships into the word alignment model to enhance the quality of Chinese-Vietnamese word alignment. These linguistic relationships are Sino-Vietnamese and content word. The experimental results showed that our method improved the performance of word alignment as well as the quality of machine translation.
机译:无监督的单词对齐方式(例如GIZA ++)广泛用于基于短语的统计机器翻译中。模型的质量与双语语料库的大小和质量成正比。但是,对于资源稀少的语言对(例如中文和越南语),由于数据稀疏,无监督单词对齐的结果有时质量较低。另外,该模型没有利用语言关系来改善单词对齐的性能。中文和越南文具有相同的语言类型,并且在语言上有密切的关系。在本文中,我们将语言关系的特征整合到单词对齐模型中,以提高汉语-越南语单词对齐的质量。这些语言关系是中越和内容词。实验结果表明,我们的方法提高了单词对齐的性能以及机器翻译的质量。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号