首页> 中文期刊>计算机、材料和连续体(英文) >Corpus Augmentation for Improving Neural Machine Translation

Corpus Augmentation for Improving Neural Machine Translation

     

摘要

The translation quality of neural machine translation(NMT)systems depends largely on the quality of large-scale bilingual parallel corpora available.Research shows that under the condition of limited resources,the performance of NMT is greatly reduced,and a large amount of high-quality bilingual parallel data is needed to train a competitive translation model.However,not all languages have large-scale and high-quality bilingual corpus resources available.In these cases,improving the quality of the corpora has become the main focus to increase the accuracy of the NMT results.This paper proposes a new method to improve the quality of data by using data cleaning,data expansion,and other measures to expand the data at the word and sentence-level,thus improving the richness of the bilingual data.The long short-term memory(LSTM)language model is also used to ensure the smoothness of sentence construction in the process of sentence construction.At the same time,it uses a variety of processing methods to improve the quality of the bilingual data.Experiments using three standard test sets are conducted to validate the proposed method;the most advanced fairseq-transformer NMT system is used in the training.The results show that the proposed method has worked well on improving the translation results.Compared with the state-of-the-art methods,the BLEU value of our method is increased by 2.34 compared with that of the baseline.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号