Corpus Augmentation for Improving Neural Machine Translation

Zijian Li; Chengying Chi; Yunyun Zhan

首页> 中文期刊>计算机、材料和连续体(英文) >Corpus Augmentation for Improving Neural Machine Translation

Corpus Augmentation for Improving Neural Machine Translation

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The translation quality of neural machine translation(NMT)systems depends largely on the quality of large-scale bilingual parallel corpora available.Research shows that under the condition of limited resources,the performance of NMT is greatly reduced,and a large amount of high-quality bilingual parallel data is needed to train a competitive translation model.However,not all languages have large-scale and high-quality bilingual corpus resources available.In these cases,improving the quality of the corpora has become the main focus to increase the accuracy of the NMT results.This paper proposes a new method to improve the quality of data by using data cleaning,data expansion,and other measures to expand the data at the word and sentence-level,thus improving the richness of the bilingual data.The long short-term memory(LSTM)language model is also used to ensure the smoothness of sentence construction in the process of sentence construction.At the same time,it uses a variety of processing methods to improve the quality of the bilingual data.Experiments using three standard test sets are conducted to validate the proposed method;the most advanced fairseq-transformer NMT system is used in the training.The results show that the proposed method has worked well on improving the translation results.Compared with the state-of-the-art methods,the BLEU value of our method is increased by 2.34 compared with that of the baseline.

著录项

来源
《计算机、材料和连续体(英文)》|2020年第7期|637-650|共14页
作者
Zijian Li; Chengying Chi; Yunyun Zhan;
展开▼
作者单位

University of Science and Technology Liaoning,Anshan,114031,China;

College of Science&Health,Technological University Dublin,Dublin,D08 X622,Ireland;

展开▼
原文格式 PDF
正文语种 chi
中图分类英语;
关键词
Neural machine translation; corpus argumentation; model improvement; deep learning; data cleaning;

相似文献

中文文献
外文文献
专利

1. Improving Parallel Corpus Quality for Chinese-Vietnamese Statistical Machine Translation [J] . Huu-anh Tran ,Yuhang Guo ,Ping Jian . 北京理工大学学报：英文版 . 2018,第1期
2. A Novel Beam Search to Improve Neural Machine Translation for English-Chinese [J] . Xinyue Lin ,Jin Liu ,Jianming Zhang . 计算机、材料和连续体(英文) . 2020,第10期
3. Improve Neural Machine Translation by Building Word Vector with Part of Speech [J] . Jinyingming Zhang ,Jin Liu ,Xinyue Lin . 人工智能杂志(英文) . 2020,第2期
4. Neural machine translation:Challenges,progress and future [J] . ZHANG JiaJun ,ZONG ChengQing . 中国科学 . 2020,第010期
5. NEW MACHINE CONCEPT FOR PRE-FORMING BY ROLLING MACHINES [C] . Herbert ROger . 第十届中国国际锻造会议暨2008年全国锻造企业厂长会议 . 2008
6. Corpus-Based Machine Translation of WebPages—A Suggestion on MT Model and Strategy to Disambiguation [A] . 陆正海 . 2004

Corpus Augmentation for Improving Neural Machine Translation

摘要

著录项

相似文献

相关主题

期刊订阅