首页> 外文会议>International Joint Conference on Natural Language Processing;Annual Meeting of the Association for Computational Linguistics >AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models
【24h】

AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models

机译:Autotinybert:高效预先培训的语言模型的自动超参数优化

获取原文

摘要

Pre-trained language models (PLMs) have achieved great success in natural language processing. Most of PLMs follow the default setting of architecture hyper-parameters (e.g.. the hidden dimension is a quarter of the intermediate dimension in feed-forward sub-networks) in BERT (Devlin et al., 2019). Few studies have been conducted to explore the design of architecture hyper-parameters in BERT, especially for the more efficient PLMs with tiny sizes, which are essential for practical deployment on resource-constrained devices. In this paper, we adopt the one-shot Neural Architecture Search (NAS) to automatically search architecture hyper-parameters. Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints. We name our method AutoTinyBERT and evaluate its effectiveness on the GLUE and SQuAD benchmarks. The extensive experiments show that our method outperforms both the SOTA search-based baseline (NAS-BERT) and the SOTA distillation-based methods (such as DistilBERT, TinyBERT. MiniLM and MobileBERT). In addition, based on the obtained architectures, we propose a more eflicient development method that is even faster than the development of a single PLM.
机译:预先接受的语言模型(PLMS)在自然语言处理中取得了巨大的成功。大多数PLMS遵循架构超参数的默认设置(例如,隐藏的维度是BERT中的前馈子网中的中间维度的四分之一(Devlin等,2019)。已经进行了很少的研究来探讨BERT中的架构超参数的设计,特别是对于具有微小尺寸的更有效的PLM,这对于资源受限设备上的实际部署至关重要。在本文中,我们采用单次神经结构搜索(NAS)自动搜索体系结构超参数。具体而言,我们仔细设计了一次性学习和搜索空间的技术,为各种延迟约束提供微小PLM的自适应和有效的开发方式。我们命名我们的方法AutotinyBert并评估其对胶水和队基准的效力。广泛的实验表明,我们的方法优于基于SOTA搜索的基线(NAS-BERT)和基于SOTA蒸馏的方法(如蒸馏,Tinybert。Minilm和Mobilebert)。此外,基于所获得的架构,我们提出了一种更加成熟的开发方法,甚至比单个PLM的开发更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号