【24h】

Syntax-augmented Multilingual BERT for Cross-lingual Transfer

机译:语法增强的多语言杆,用于交叉转移

获取原文

摘要

In recent years, we have seen a colossal effort in pre-training multilingual text encoders using large-scale corpora in many languages to facilitate cross-lingual transfer learning. However, due to typological differences across languages, the cross-lingual transfer is challenging. Nevertheless, language syntax, e.g.. syntactic dependencies, can bridge the typological gap. Previous works have shown that pre-trained multilingual encoders, such as mBERT (Devlin et al., 2019), capture language syntax, helping cross-lingual transfer. This work shows that explicitly providing language syntax and training mBERT using an auxiliary objective to encode the universal dependency tree structure helps cross-lingual transfer. We perform rigorous experiments on four NLP tasks, including text classification, question answering, named entity recognition, and task-oriented semantic parsing. The experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks, such as PAWS-X and MLQA, by 1.4 and 1.6 points on average across all languages. In the generalized transfer setting, the performance boosted significantly, with 3.9 and 3.1 points on average in PAWS-X and MLQA.
机译:近年来,我们在使用许多语言中使用大型语料的预训练多语言文本编码器中看到了巨大的努力,以促进交叉转移学习。然而,由于语言的类型学差异,交叉传输是具有挑战性的。然而,语言语法,例如语法依赖关系,可以弥合类型的差距。以前的作品表明,预训练的多语言编码器,如MBERT(Devlin等,2019),捕获语言语法,帮助交叉传输。这项工作表明,使用辅助目标来编码通用依赖树结构的语言语法和训练Mbert有助于交叉传输。我们在四个NLP任务中执行严格的实验,包括文本分类,问题应答,命名实体识别以及面向任务的语义解析。实验结果表明,Syntax-Augmented Mbert在各种语言中提高了流行基准的交叉转移,例如PAWS-X和MLQA,平均平均为1.4和1.6点。在广义转移设定中,性能显着提升,PAWS-X和MLQA平均为3.9和3.1点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号