Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

机译：你能告诉我怎么走芝麻街吗？语言建模之外的句子级预训练

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling. We conduct the first large-scale systematic study of candidate pretraining tasks, comparing 19 different tasks both as alternatives and complements to language modeling. Our primary results support the use language modeling, especially when combined with pretraining on additional labeled-data tasks. However, our results are mixed across pretraining tasks and show some concerning trends: In ELMo's pretrain-then-freeze paradigm, random baselines are worryingly strong and results vary strikingly across target tasks. In addition, fine-tuning BERT on an intermediate task often negatively impacts downstream transfer. In a more positive trend, we see modest gains from multitask training, suggesting the development of more sophisticated multitask and transfer learning techniques as an avenue for further research.

机译：对自然语言的理解最近已经看到，使用诸如ELMo（Peters等人，2018a）和BERT（Devlin等人，2019）之类的句子编码器已经取得了飞跃的进步，这些句子编码器已经在语言建模的变体上进行了预训练。我们对候选人的预训练任务进行了首次大规模的系统研究，比较了19种不同的任务，这些任务既可以替代语言模型，也可以作为语言建模的补充。我们的主要结果支持使用语言建模，尤其是与其他标签数据任务的预培训结合使用时。但是，我们的结果在预训练任务中混杂在一起，并显示出一些令人担忧的趋势：在ELMo的“预训练然后冻结”范式中，随机基线非常强大，结果在目标任务之间差异很大。此外，在中间任务上对BERT进行微调通常会对下游传输产生负面影响。在一个更积极的趋势中，我们看到多任务培训的收益不大，这表明发展更复杂的多任务和转移学习技术可以作为进一步研究的途径。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|4465-4476|共12页
会议地点
作者
Alex Wang; Jan Hula; Patrick Xia; Raghavendra Pappagari; R. Thomas McCoy; Roma Patel; Najoung Kim; Ian Tenney; Yinghui Huang; Katherin Yu; Shuning Jin; Berlin Chen; Benjamin Van Durme; Edouard Grave; Ellie Pavlick; Samuel R. Bowman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study [J] . Renzo Rivera Zavala, Paloma Martinez JMIR Medical Informatics . 2020,第12期

机译：普瑞赖尔语言模型对跨语言医学文本否定和猜测检测的影响：比较研究
2. Short-term Memory and Language Processing: Extending an Interactive Model to Capture Sentence-Level Data [J] . Valantis Fyndanis Procedia - Social and Behavioral Sciences . 2010,第2期

机译：短期记忆和语言处理：扩展交互式模型以捕获句子级数据
3. Extractive Summarization with Very Deep Pretrained Language Model [J] . Yang Gu, Yanke Hu International Journal of Artificial Intelligence & Applications (IJAIA) . 2019,第2期

机译：利用非常深的预用语言模型进行提取综准
4. Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling [C] . Alex Wang, Jan Hula, Patrick Xia, Annual meeting of the Association for Computational Linguistics . 2019

机译：你能告诉我如何获得芝麻街吗？超越语言建模的句子级预估计
5. Motivation themes: Achievement, affiliation, and power in "Sesame Street" from 1969-2013. [D] . Hempstead, Karen Elaine. 2015

机译：动机主题：1969年至2013年在“芝麻街”中取得的成就，隶属关系和权力。
6. A Predictive Coding Perspective on Beta Oscillations during Sentence-Level Language Comprehension [O] . Ashley G. Lewis, Jan-Mathijs Schoffelen, Herbert Schriefers, 2016

机译：句子级语言理解过程中β振荡的预测编码观点。
7. Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling [O] . Alex Wang, Jan Hula, Patrick Xia, 2019

机译：你能告诉我如何获得芝麻街吗？超越语言建模的句子级预估计
8. Language Modeling With Sentence-Level Mixtures. [R] . Iyer, R., Ostendorf, M., Rohlicek, J. R. 1994

机译：使用句子级混合的语言建模。

Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling

摘要

著录项

相似文献

相关主题

期刊订阅