首页> 外文会议>International joint conference on natural language processing;Conference on empirical methods in natural language processing >Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks
【24h】

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

机译:Unicoder:通过预训练并具有多种跨语言任务的通用语言编码器

获取原文

摘要

We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training data in one language and directly applied to inputs of the same task in other languages. Comparing to similar efforts such as Multilingual BERT (Devlin et al., 2018) and XLM (Lam-ple and Conneau, 2019), three new cross-lingual pre-training tasks are proposed, including cross-lingual word recovery, cross-lingual paraphrase classification and cross-lingual masked language model. These tasks help Unicoder learn the mappings among different languages from more perspectives. We also find that doing fine-tuning on multiple languages together can bring further improvement. Experiments are performed on two tasks: cross-lingual natural language inference (XNLI) and cross-lingual question answering (XQA), where XLM is our baseline. On XNLI. 1.8% averaged accuracy improvement (on 15 languages) is obtained. On XQA, which is a new cross-lingual dataset built by us, 5.5% averaged accuracy improvement (on French and German) is obtained.
机译:我们介绍了Unicoder,这是一种对不同语言不敏感的通用语言编码器。给定任意NLP任务,可以使用一种语言的训练数据使用Unicoder训练模型,并直接将其应用于其他语言的同一任务的输入。与多语言BERT(Devlin等人,2018)和XLM(Lam-ple and Conneau,2019)等类似工作相比,提出了三种新的跨语言预训练任务,包括跨语言单词恢复,跨语言复述分类和跨语言的掩蔽语言模型。这些任务可帮助Unicoder从更多角度学习不同语言之间的映射。我们还发现,一起对多种语言进行微调可以带来进一步的改进。实验是在两项任务上进行的:跨语言自然语言推理(XNLI)和跨语言问答(XQA),其中XLM是我们的基线。在XNLI上。平均精度提高了1.8%(使用15种语言)。在我们建立的新的跨语言数据集XQA上,平均准确率提高了5.5%(法语和德语)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号