Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

机译：Unicoder：通过预训练并具有多种跨语言任务的通用语言编码器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training data in one language and directly applied to inputs of the same task in other languages. Comparing to similar efforts such as Multilingual BERT (Devlin et al., 2018) and XLM (Lam-ple and Conneau, 2019), three new cross-lingual pre-training tasks are proposed, including cross-lingual word recovery, cross-lingual paraphrase classification and cross-lingual masked language model. These tasks help Unicoder learn the mappings among different languages from more perspectives. We also find that doing fine-tuning on multiple languages together can bring further improvement. Experiments are performed on two tasks: cross-lingual natural language inference (XNLI) and cross-lingual question answering (XQA), where XLM is our baseline. On XNLI. 1.8% averaged accuracy improvement (on 15 languages) is obtained. On XQA, which is a new cross-lingual dataset built by us, 5.5% averaged accuracy improvement (on French and German) is obtained.

机译：我们介绍了Unicoder，这是一种对不同语言不敏感的通用语言编码器。给定任意NLP任务，可以使用一种语言的训练数据使用Unicoder训练模型，并直接将其应用于其他语言的同一任务的输入。与多语言BERT（Devlin等人，2018）和XLM（Lam-ple and Conneau，2019）等类似工作相比，提出了三种新的跨语言预训练任务，包括跨语言单词恢复，跨语言复述分类和跨语言的掩蔽语言模型。这些任务可帮助Unicoder从更多角度学习不同语言之间的映射。我们还发现，一起对多种语言进行微调可以带来进一步的改进。实验是在两项任务上进行的：跨语言自然语言推理（XNLI）和跨语言问答（XQA），其中XLM是我们的基线。在XNLI上。平均精度提高了1.8％（使用15种语言）。在我们建立的新的跨语言数据集XQA上，平均准确率提高了5.5％（法语和德语）。

著录项

来源
《International joint conference on natural language processing;Conference on empirical methods in natural language processing》|2019年|2485-2494|共10页
会议地点
作者
Haoyang Huang; Yaobo Liang; Nan Duan; Ming Gong; Linjun Shou; Daxin Jiang; Ming Zhou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. New treebank or repurposed? On the feasibility of cross-lingual parsing of Romance languages with Universal Dependencies [J] . MARCOS GARCIA, CARLOS GOMEZ-RODRIGUEZ, MIGUEL A. ALONSO Natural language engineering . 2018,第pta1期

机译：新的树库还是已重新利用？关于具有普遍依赖性的浪漫语言跨语言解析的可行性
2. Cross-lingual sentiment classification using multiple source languages in multi-view semi-supervised learning [J] . Mohammad Sadegh Hajmohammadi, Roliana Ibrahim, Ali Selamat Engineering Applications of Artificial Intelligence . 2014,第nova期

机译：在多视图半监督学习中使用多种源语言进行跨语言情感分类
3. Pre-training inactivation of basolateral amygdala and mediodorsal thalamus, but not orbitofrontal cortex or prelimbic cortex, impairs devaluation in a multiple-response/multiple-reinforcer cued operant task [J] . Fisher Hayley, Pajser Alisa, Pickens Charles L. Behavioural Brain Research: An International Journal . 2020,第期

机译：Basolateral Amygdala和Mediodorsal Thalamus的预训练灭活，但不是Orbitofrontal皮质或前肢皮质，损害多响应/多加强型Cure操作任务的贬值
4. Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks [C] . Haoyang Huang, Yaobo Liang, Nan Duan, International joint conference on natural language processing . 2019

机译：Unicoder：通过使用多个交叉任务进行预培训通用语言编码器
5. Cross-Lingual Word Sense Disambiguation for Languages with Scarce Resources. [D] . Sarrafzadeh, Bahareh. 2011

机译：资源匮乏的语言的跨语言单词义消歧。
6. Cross-lingual Unified Medical Language System entity linking in online health communities [O] . Yonatan Bitton, Raphael Cohen, Tamar Schifter, 2020

机译：在线健康社区中链接的跨语言统一医疗语言系统实体
7. New Treebank or Repurposed? On the Feasibility of Cross-Lingual Parsing of Romance Languages with Universal Dependencies [O] . Garcia, Marcos, Gómez-Rodríguez, Carlos, Alonso Pardo, Miguel Ángel 2018

机译：新的树库还是已重新利用？具有普遍依赖性的浪漫语言跨语言解析的可行性研究

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

摘要

著录项

相似文献

相关主题

期刊订阅