A Mixture of h - 1 Heads is Better than h Heads

机译：h-1头的混合物比h头好

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multi-head attentive neural architectures have achieved state-of-the-art results on a variety of natural language processing tasks. Evidence has shown that they are overparameter-ized; attention heads can be pruned without significant performance loss. In this work, we instead "reallocate" them-the model learns to activate different heads on different inputs. Drawing connections between multi-head attention and mixture of experts, we propose the mixture of attentive experts model (MAE). MAE is trained using a block coordinate descent algorithm that alternates between updating (1) the responsibilities of the experts and (2) their parameters. Experiments on machine translation and language modeling show that MAE outperforms strong baselines on both tasks. Particularly, on the WMT14 English to German translation dataset, MAE improves over "transformer-base" by 0.8 BLEU, with a comparable number of parameters. Our analysis shows that our model learns to specialize different experts to different inputs1.

机译：多头专注神经结构在各种自然语言处理任务上取得了最新的成果。有证据表明，它们被过度参数化；注意力集中的人可以在没有显著表现损失的情况下被修剪掉。在这项工作中，我们转而“重新分配”它们——模型学习在不同的输入上激活不同的头部。借鉴多头注意与专家混合的关系，我们提出了注意力专家混合模型（MAE）。MAE使用块坐标下降算法进行训练，该算法在更新（1）专家职责和（2）专家参数之间交替进行。机器翻译和语言建模实验表明，MAE在这两项任务上都优于强基线。特别是，在WMT14英语到德语翻译数据集上，MAE比“transformer base”提高了0.8 BLEU，参数数量相当。我们的分析表明，我们的模型可以让不同的专家专门处理不同的输入1。

著录项

来源
《Annual Meeting of the Association for Computational Linguistics》|2020年|6566-6577|共12页
会议地点
作者
Hao Peng; Roy Schwartz; Dianqi Li; Noah A. Smith;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Investigation of lubricant transfer and distribution at head/disk interface in air-helium gas mixtures [J] . Zhengqiang TANG, Dongdong ZHOU, Tong JIA, 摩擦（英文版） . 2019,第006期
2. Investigation of lubricant transfer and distribution at head/disk interface in air-helium gas mixtures [J] . Zhengqiang TANG, Dongdong ZHOU, Tong JIA, 摩擦（英文版） . 2019,第006期
3. Design and Experimental Analyses of Small-flow High-head Centrifugal-vortex Pump for Gas-Liquid Two-phase Mixture [J] . 朱祖超, 谢鹏, 偶国富, 中国化学工程学报：英文版 . 2008,第004期
4. Reliability and validity of the Chinese version of the Head and Neck Information Needs Questionnaire for patients with head and neck cancer and their caregivers [J] . Yu Li, Lihui Liu, Rong Yan, 国际护理科学（英文） . 2021,第003期
5. Aqueous mixtures of di-n-decyldimethylammonium chloride/polyoxyethylene alkyl ether: Dramatic influence of tail/tail and head/head interactions on co-micellization and biocidal activity [J] . Rauwel G., Leclercq L., Criquelion J., Journal of Colloid and Interface Science . 2012,第Null期

机译：二正癸基二甲基氯化铵/聚氧乙烯烷基醚的水性混合物：尾/尾和头/头相互作用对共胶束化和杀生物活性的剧烈影响
6. Head-to-Head Comparison of Chest X-Ray/Head and Neck MRI, Chest CT/Head and Neck MRI, and F-18-FDG PET/CT for Detection of Distant Metastases and Synchronous Cancer in Oral, Pharyngeal, and Laryngeal Cancer [J] . Rohde Max, Nielsen Anne L., Johansen Jorgen, The Journal of Nuclear Medicine . 2017,第12期

机译：胸部X射线/头部和颈部MRI，胸部CT /头部和颈部MRI，以及用于检测远处转移和口腔，咽及和喉癌的同步癌症的F-18-FDG PET / CT的头部对比
7. Treatment of type 2 diabetes: Exenatide or liraglutide? - Exenatide and liraglutide in head-to-head comparison approximately coequal [Behandlung des Diabetes mellitus Typ 2: Exenatid oder Liraglutid? - Exenatid und Liraglutid im "head-to head"-Vergleich in etwa gleichrangig] [J] . Schatz,H. DMW: Deutsche Medizinische Wochenschrift . 2013,第8期

机译：2型糖尿病的治疗：艾塞那肽或利拉鲁肽？ -艾塞那肽和利拉鲁肽的头对头比较大约相等[Behandlung des Diabetes mellitus Typ 2：艾塞那肽吗？ -Exenatid und Liraglutid是“头对头”-Verwaich in etwa gleichrangig]
8. Head2Head: Video-based Neural Head Synthesis [C] . Mohammad Rami Koujan, Michail Christos Doukas, Anastasios Roussos, International Conference on Automatic Face and Gesture Recognition . 2020

机译：HEAD2HEAD：基于视频的神经头合成
9. Evaluating academic outcomes of Head Start: An application of general growth mixture modeling. [D] . Kreisman, Michele Booth. 2001

机译：评估Head Start的学术成果：通用增长混合模型的应用。
10. Methodical Considerations and Resistance Evaluation against F. graminearum and F. culmorum Head Blight in Wheat. The Influence of Mixture of Isolates on Aggressiveness and Resistance Expression [O] . Akos Mesterhazy, Andrea Gyorgy, Monika Varga, 2020

机译：对小麦的F. Graminearum和F. Culmorum Head枯萎的方法考虑因素及抵抗评价。分离物混合物对侵袭性和抗性表达的影响
11. A Mixture of h - 1 Heads is Better than h Heads [O] . Hao Peng, Roy Schwartz, Dianqi Li, 2020

机译：H-1头的混合物优于H头
12. Results of Limited Scope Review at Kings Community Action Organization, Inc. for the Head Start, Early Head Start, and Migrant and Seasonal Head Start Programs [R] . 2010

机译：Kings社会行动组织有限公司的有限范围审查结果，包括先行，早期启动以及移民和季节性启动计划

A Mixture of h - 1 Heads is Better than h Heads

摘要

著录项

相似文献

相关主题

期刊订阅