首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling
【24h】

Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

机译:你能告诉我怎么走芝麻街吗?语言建模之外的句子级预训练

获取原文

摘要

Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling. We conduct the first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling. Our primary results support the use language modeling, especially when combined with pretraining on additional labeled-data tasks. However, our results are mixed across pretraining tasks and show some concerning trends: In ELMo's pretrain-then-freeze paradigm, random baselines are worryingly strong and results vary strikingly across target tasks. In addition, fine-tuning BERT on an intermediate task often negatively impacts downstream transfer. In a more positive trend, we see modest gains from multitask training, suggesting the development of more sophisticated multitask and transfer learning techniques as an avenue for further research.
机译:对自然语言的理解最近已经看到,使用诸如ELMo(Peters等人,2018a)和BERT(Devlin等人,2019)之类的句子编码器已经取得了飞跃的进步,这些句子编码器已经在语言建模的变体上进行了预训练。我们对候选人的预训练任务进行了首次大规模的系统研究,比较了19种不同的任务,这些任务既可以替代语言模型,也可以作为语言建模的补充。我们的主要结果支持使用语言建模,尤其是与其他标签数据任务的预培训结合使用时。但是,我们的结果在预训练任务中混杂在一起,并显示出一些令人担忧的趋势:在ELMo的“预训练然后冻结”范式中,随机基线非常强大,结果在目标任务之间差异很大。此外,在中间任务上对BERT进行微调通常会对下游传输产生负面影响。在一个更积极的趋势中,我们看到多任务培训的收益不大,这表明发展更复杂的多任务和转移学习技术可以作为进一步研究的途径。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号