Adaptive Attention Span in Transformers

机译：变压器中的自适应注意范围

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a novel self-attention mechanism that can learn its optimal attention span. This allows us to extend significantly the maximum context size used in Transformer, while maintaining control over their memory footprint and computational time. We show the effectiveness of our approach on the task of character level language modeling, where we achieve state-of-the-art performances on text8 and enwiki8 by using a maximum context of 8k characters.

机译：我们提出了一种新颖的自我注意机制，可以学习其最佳注意范围。这使我们可以显着扩展Transformer中使用的最大上下文大小，同时保持对它们的内存占用量和计算时间的控制。我们展示了我们的方法在字符级语言建模任务上的有效性，其中通过使用8k个字符的最大上下文，在text8和enwiki8上实现了最新的性能。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|331-335|共5页
会议地点
作者
Sainbayar Sukhbaatar; Edouard Grave; Piotr Bojanowski; Armand Joulin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:54:08

相似文献

外文文献
中文文献
专利

1. 250?kA compact linear transformer driver for wire array span class="aps-inline-formula"math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"miz/mi/math/span-pinch loads [J] . S. C. Bott, D. M. Haas, R. E. Madden, Physical Review. Accelerators and Beams . 2011,第5期

机译：250？kA用于线阵列的紧凑型线性变压器驱动器 class =“ aps-inline-formula”> $< mi> z -捏负载$
2. Modulation of the attentional span by foveal and parafoveal task load: An ERP study using attentional probes [J] . Kornrumpf Benthe, Sommer Werner Psychophysiology . 2015,第9期

机译：小凹和副凹任务负荷对注意范围的调节：使用注意探针的ERP研究
3. Examining the Relative Contribution of Memory Updating, Attention Focus Switching, and Sustained Attention to Children’s Verbal Working Memory Span [J] . Beula M.Magimairaj, James W.Montgomery Child Development Research . 2013,第3期

机译：检查记忆更新，注意力转移和持续关注儿童口头工作记忆跨度的相对贡献
4. Adaptive Attention Span in Transformers [C] . Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, Annual meeting of the Association for Computational Linguistics . 2019

机译：变压器的自适应注意力跨度
5. The Contributing Role of Processing Speed and Attention Span on Learning Efficiency in a Cohort of Clinically Referred Female Cancer Survivors [D] . Walters, Darrell M. 2020

机译：加工速度和注意力跨度对临床综合女性癌症幸存者队列学习效率的贡献作用
6. Component Analysis of Simple Span vs. Complex Span Adaptive Working Memory Exercises: A Randomized Controlled Trial [O] . Bradley S. Gibson, William G. Kronenberger, Dawn M. Gondoli, -1

机译：简单跨度与复杂跨度自适应工作记忆练习的分量分析：随机受控试验
7. SPAN OF CONTROL AND SPAN OF ATTENTION [O] . Oriana Bandiera, Andrea Prat, Raffaella Sadun, 2013

机译：控制的跨度和注意的范围
8. Experimental Investigation of Improving Human Problem-Solving Performance by Guiding Attention and Adaptively Proving Details on Information Displays [R] . Narayanan, N. H. 2007

机译：通过引导注意和自适应证明信息显示细节提高人类问题解决能力的实验研究

Adaptive Attention Span in Transformers

摘要

著录项

相似文献

相关主题

期刊订阅