Adaptive Attention Span in Transformers

机译：变压器的自适应注意力跨度

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a novel self-attention mechanism that can learn its optimal attention span. This allows us to extend significantly the maximum context size used in Transformer, while maintaining control over their memory footprint and computational time. We show the effectiveness of our approach on the task of character level language modeling, where we achieve state-of-the-art performances on text8 and enwiki8 by using a maximum context of 8k characters.

机译：我们提出了一种新颖的自我关注机制，可以学习其最佳关注跨度。这允许我们显着扩展变压器中使用的最大上下文尺寸，同时保持对存储空间和计算时间的控制。我们展示了我们对字符级语言建模任务的方法的有效性，在那里我们通过使用8K字符的最大上下文来实现Text8和EnWiki8上的最先进的表演。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|cxxxiv 659 p.|共5页
会议地点
作者
Sainbayar Sukhbaatar; Edouard Grave; Piotr Bojanowski; Armand Joulin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. 250?kA compact linear transformer driver for wire array span class="aps-inline-formula"math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"miz/mi/math/span-pinch loads [J] . S. C. Bott, D. M. Haas, R. E. Madden, Physical Review. Accelerators and Beams . 2011,第5期

机译：250？kA用于线阵列的紧凑型线性变压器驱动器 class =“ aps-inline-formula”> $< mi> z -捏负载$
2. Modulation of the attentional span by foveal and parafoveal task load: An ERP study using attentional probes [J] . Kornrumpf Benthe, Sommer Werner Psychophysiology . 2015,第9期

机译：小凹和副凹任务负荷对注意范围的调节：使用注意探针的ERP研究
3. Examining the Relative Contribution of Memory Updating, Attention Focus Switching, and Sustained Attention to Children’s Verbal Working Memory Span [J] . Beula M.Magimairaj, James W.Montgomery Child Development Research . 2013,第3期

机译：检查记忆更新，注意力转移和持续关注儿童口头工作记忆跨度的相对贡献
4. Adaptive Attention Span in Transformers [C] . Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, Annual meeting of the Association for Computational Linguistics . 2019

机译：变压器中的自适应注意范围
5. The Contributing Role of Processing Speed and Attention Span on Learning Efficiency in a Cohort of Clinically Referred Female Cancer Survivors [D] . Walters, Darrell M. 2020

机译：加工速度和注意力跨度对临床综合女性癌症幸存者队列学习效率的贡献作用
6. Component Analysis of Simple Span vs. Complex Span Adaptive Working Memory Exercises: A Randomized Controlled Trial [O] . Bradley S. Gibson, William G. Kronenberger, Dawn M. Gondoli, -1

机译：简单跨度与复杂跨度自适应工作记忆练习的分量分析：随机受控试验
7. SPAN OF CONTROL AND SPAN OF ATTENTION [O] . Oriana Bandiera, Andrea Prat, Raffaella Sadun, 2013

机译：控制的跨度和注意的范围
8. Experimental Investigation of Improving Human Problem-Solving Performance by Guiding Attention and Adaptively Proving Details on Information Displays [R] . Narayanan, N. H. 2007

机译：通过引导注意和自适应证明信息显示细节提高人类问题解决能力的实验研究

Adaptive Attention Span in Transformers

摘要

著录项

相似文献

相关主题

期刊订阅