Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

机译：预先训练的语言模型中的超级票：从模型压缩到改善泛化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of "lottery tickets", and training a certain collection of them (i.e., a subnetwork) can match the performance of the full model. In this paper, we study such a collection of tickets, which is referred to as "winning tickets", in extremely over-parametrized models, e.g., pre-trained language models. We observe that at certain compression ratios, the generalization performance of the winning tickets can not only match but also exceed that of the full model. In particular, we observe a phase transition phenomenon: As the compression ratio increases, generalization performance of the winning tickets first improves then deteriorates after a certain threshold. We refer to the tickets on the threshold as "super tickets". We further show that the phase transition is task and model dependent - as the model size becomes larger and the training data set becomes smaller, the transition becomes more pronounced. Our experiments on the GLUE benchmark show that the super tickets improve single task fine-tuning by 0.9 points on BERT-base and 1.0 points on BERT-large, in terms of task-average score. We also demonstrate that adaptively sharing the super tickets across tasks benefits multi-task learning.

机译：彩票假设表明，过度参数化网络由“彩票票”组成，并培训它们的某些集合（即，子网）可以匹配完整模型的性能。在本文中，我们研究了这样的一系列票证，它被称为“获胜门票”，其在极端过度参数化模型中，例如，预先训练的语言模型。我们观察到，在某些压缩比率下，获奖票的泛化性能不仅可以匹配，而且也超过完整模型的匹配。特别地，我们观察到相变现象：随着压缩比增加，获胜门票的泛化性能首先改善在某个阈值之后劣化。我们将门票上的门票称为“超级票”。我们进一步表明，相位转换是任务和模型依赖 - 随着模型尺寸变大并且训练数据集变小，转换变得更加明显。我们对胶水基准测试的实验表明，超级票改善了单一任务精细调整，在伯特基地上的0.9点和Bert-Light的1.0点，在任务平均分数方面。我们还证明，适自适放的任务中的超级票益处了多任务学习。

著录项

来源
《International Joint Conference on Natural Language Processing;Annual Meeting of the Association for Computational Linguistics》|2021年|6524-6538|共15页
会议地点
作者
Chen Liang; Simiao Zuo; Minshuo Chen; Haoming Jiang; Xiaodong Liu; Pengcheng He; Tuo Zhao; Weizhu Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model [J] . Sun Yi, Qiu Hangping, Zheng Yu, Quality Control, Transactions . 2020,第期

机译：Sifrank：基于预先训练的语言模型的无监督关键术提取新基线
2. Comparing pre-trained language models for Spanish hate speech detection [J] . Miriam Plaza-del-Arco Flor, Dolores Molina-Gonzalez M., Alfonso Urena-Lopez L., Expert systems with applications . 2021,第Mara期

机译：比较预先培训的语言模型，用于西班牙语仇恨语音检测
3. Injecting Event Knowledge into Pre-Trained Language Models for Event Extraction [J] . Zining Yang, Siyu Zhan, Mengshu Hou, Computer Science & Information Technology . 2020,第14期

机译：将事件知识注入预先培训的语言模型以进行事件提取
4. Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models [C] . Ethan Wilcox, Peng Qian, Richard Futrell, Conference on Empirical Methods in Natural Language Processing . 2020

机译：结构监督在神经语言模型中提高了几次射击学习和句法泛化
5. Improving Neural Language Models with Black-Box Analysis and Generalization Through Memorization [D] . Khandelwal, Urvashi. 2021

机译：通过记忆提高黑匣子分析和泛化的神经语言模型
6. Relation Extraction from Clinical Narratives Using Pre-trained Language Models [O] . Qiang Wei, Zongcheng Ji, Yuqi Si, 2019

机译：使用预训练的语言模型从临床叙事中提取关系
7. From Bag-of-Words to Pre-trained Neural Language Models: Improving Automatic Classification of App Reviews for Requirements Engineering [O] . Adailton Araujo, Marcos Golo, Breno Viana, 2020

机译：从单词袋到预先培训的神经语言模型：改进应用程序审查的自动分类对需求工程

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

摘要

著录项

相似文献

相关主题

期刊订阅