【24h】

Transfer Learning for Sequence Generation: from Single-source to Multi-source

机译:转移序列生成学习:从单源到多源

获取原文

摘要

Multi-source sequence generation (MSG) is an important kind of sequence generation tasks that takes multiple sources, including automatic post-editing, multi-source translation, multi-document summarization, etc. As MSG tasks suffer from the data scarcity problem and recent pretrained models have been proven to be effective for low-resource downstream tasks, transferring pretrained sequence-to-sequencc models to MSG tasks is essential. Although directly finetuning pretrained models on MSG tasks and concatenating multiple sources into a single long sequence is regarded as a simple method to transfer pretrained models to MSG tasks, we conjecture that the direct finetuning method leads to catastrophic forgetting and solely relying on pretrained self-attention layers to capture cross-source information is not sufficient. Therefore, we propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks. Experiments show that our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set. When adapted to document-level translation, our framework outperforms strong baselines significantly.
机译:多源序列生成(MSG)是具有多个源的重要序列生成任务,包括自动编辑,多源转换,多文件摘要等,因为MSG任务遭受数据稀缺问题和最近的已经证明预磨模的模型对于低资源下游任务有效,将预先训练的序列到序列式模型转移到MSG任务至关重要。虽然直接在MSG任务上进行了预付借预热模型,但将多个源连接到单个长序列中被视为将掠夺模型转移到MSG任务的简单方法,但我们猜想直接的FineTuning方法导致灾难性的遗忘,并仅仅依赖于预磨损的自我关注捕获跨源信息的层是不够的。因此,我们提出了一种两级的FineTuning方法来缓解预rain-Finetune差异,并引入具有精细编码器的新型MSG模型,以了解MSG任务中更好的表示。实验表明,我们的方法在WMT17 APE任务和多源翻译任务上实现了新的最先进的结果,使用WMT14测试集。当适应文件级翻译时,我们的框架显着优于强大的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号