AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models

机译：Autotinybert：高效预先培训的语言模型的自动超参数优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Pre-trained language models (PLMs) have achieved great success in natural language processing. Most of PLMs follow the default setting of architecture hyper-parameters (e.g.. the hidden dimension is a quarter of the intermediate dimension in feed-forward sub-networks) in BERT (Devlin et al., 2019). Few studies have been conducted to explore the design of architecture hyper-parameters in BERT, especially for the more efficient PLMs with tiny sizes, which are essential for practical deployment on resource-constrained devices. In this paper, we adopt the one-shot Neural Architecture Search (NAS) to automatically search architecture hyper-parameters. Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints. We name our method AutoTinyBERT and evaluate its effectiveness on the GLUE and SQuAD benchmarks. The extensive experiments show that our method outperforms both the SOTA search-based baseline (NAS-BERT) and the SOTA distillation-based methods (such as DistilBERT, TinyBERT. MiniLM and MobileBERT). In addition, based on the obtained architectures, we propose a more eflicient development method that is even faster than the development of a single PLM.

机译：预先接受的语言模型（PLMS）在自然语言处理中取得了巨大的成功。大多数PLMS遵循架构超参数的默认设置（例如，隐藏的维度是BERT中的前馈子网中的中间维度的四分之一（Devlin等，2019）。已经进行了很少的研究来探讨BERT中的架构超参数的设计，特别是对于具有微小尺寸的更有效的PLM，这对于资源受限设备上的实际部署至关重要。在本文中，我们采用单次神经结构搜索（NAS）自动搜索体系结构超参数。具体而言，我们仔细设计了一次性学习和搜索空间的技术，为各种延迟约束提供微小PLM的自适应和有效的开发方式。我们命名我们的方法AutotinyBert并评估其对胶水和队基准的效力。广泛的实验表明，我们的方法优于基于SOTA搜索的基线（NAS-BERT）和基于SOTA蒸馏的方法（如蒸馏，Tinybert。Minilm和Mobilebert）。此外，基于所获得的架构，我们提出了一种更加成熟的开发方法，甚至比单个PLM的开发更快。

著录项

来源
《International Joint Conference on Natural Language Processing;Annual Meeting of the Association for Computational Linguistics 》|2021年|5146-5157|共12页
会议地点
作者
Yichun Yin; Cheng Chen; Lifeng Shang; Xin Jiang; Xiao Chen; Qun Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Comparing pre-trained language models for Spanish hate speech detection [J] . Miriam Plaza-del-Arco Flor, Dolores Molina-Gonzalez M., Alfonso Urena-Lopez L., Expert systems with applications . 2021 ,第Mara期

机译：比较预先培训的语言模型，用于西班牙语仇恨语音检测
2. Injecting Event Knowledge into Pre-Trained Language Models for Event Extraction [J] . Zining Yang, Siyu Zhan, Mengshu Hou, Computer Science & Information Technology . 2020 ,第14期

机译：将事件知识注入预先培训的语言模型以进行事件提取
3. Event Nugget Detection using Pre-trained Language Models [J] . Riadh Meghatria, Chiraz Latiri, Fahima Nader Procedia Computer Science . 2020 ,第5期

机译：事件块使用预先培训的语言模型检测
4. BUT-FIT at SemEval-2020 Task 5: Automatic detection of counterfactual statements with deep pre-trained language representation models [C] . Martin Fajcik, Josef Jon, Martin Docekal, International Workshop on Semantic Evaluation . 2020

机译：But-Fit在Semeval-2020任务5：使用深度预先接受的语言表示模型自动检测反事实语句
5. A Framework for Automatic Dynamic Constraint Verification in Cyber Physical System Modeling Languages [D] . Bunting, Matt. 2020

机译：网络物理系统建模语言中自动动态约束验证的框架
6. Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection [O] . Xueqiang Zeng, Gang Luo 2017

机译：基于渐进采样的贝叶斯优化可实现高效自动的机器学习模型选择
7. SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization [O] . Haoming Jiang, Pengcheng He, Weizhu Chen, 2020

机译：SMART：通过原理正常化优化进行预先训练的自然语言模型的强大和高效的微调

AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models

摘要

著录项

相似文献

相关主题

期刊订阅