Self-Attention Networks Can Process Bounded Hierarchical Languages

机译：自我关注网络可以处理有界分层语言

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as Dyck_κ, the language consisting of well-nested parentheses of κ types. This suggested that natural language can be approximated well with models that are too weak for formal languages, or that the role of hierarchy and recursion in natural language might be limited. We qualify this implication by proving that self-attention networks can process Dyck_(κ,D), the subset of Dyck_κ with depth bounded by D, which arguably better captures the bounded hierarchical structure of natural language. Specifically, we construct a hard-attention network with D + 1 layers and O(log κ) memory size (per token per layer) that recognizes Dyck_(κ,D), and a soft-attention network with two layers and O(log k) memory size that generates Dyck_(κ,D). Experiments show that self-attention networks trained on Dyck_(κ, D) generalize to longer inputs with near-perfect accuracy, and also verify the theoretical memory advantage of self-attention networks over recurrent networks.

机译：尽管在NLP中令人印象深刻的性能，但最近被证明是对处理具有分层结构的正式语言的自我关注网络，例如Dyck_κ，由κ类型纯净的括号括号组成的语言。这表明自然语言可以很好地用太弱的模型对正式语言的模型近似，或者层次结构的作用和自然语言中的递归可能有限。我们通过证明自我关注网络可以处理Dyck_（κ，d），Dyck_κ的子集，Dyck_k的界定的Dyck_κ的子集可以说是更好地捕获自然语言的有界分层结构。具体而言，我们用D + 1层和O（logκ）内存大小（每层令牌）构建一个难以注意的网络，识别Dyck_（κ，D）和带有两层和O的软关注网络（日志k）生成Dyck_（κ，d）的内存大小。实验表明，在Dyck_（κ，d）上训练的自我关注网络概括为具有近乎完美的准确性的更长的输入，并且还验证了自我关注网络在经常性网络上的理论记忆优势。

著录项

来源
《Annual Meeting of the Association for Computational Linguistics;International Joint Conference on Natural Language Processing》|2021年|3770-3785|共16页
会议地点
作者
Shunyu Yao; Binghui Peng; Christos Papadimitriou; Karthik Narasimhan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:58:32

相似文献

外文文献
中文文献
专利

1. Language networks in anophthalmia: Maintained hierarchy of processing in 'visual' cortex [J] . WatkinsK.E., CoweyA., AlexanderI., Brain: A journal of neurology . 2012,第5期

机译：失语症的语言网络：维持“视觉”皮层中的处理层次
2. Language networks in anophthalmia: maintained hierarchy of processing in ‘visual’ cortex [J] . Kate E. Watkins12 Alan Cowey2 Iona Alexander2 Nicola Filippini13 James M. Kennedy1 Stephen M. Smith1 Nicola Ragge45 and Holly Bridge1 Brain . 2012,第5期

机译：失语症的语言网络：维持“视觉”皮层中的处理层次
3. Hierarchical Self-Attention Hybrid Sparse Networks for Document Classification [J] . Weichun Huang, Ziqiang Tao, Xiaohui Huang, Mathematical Problems in Engineering: Theory, Methods and Applications . 2021,第a期

机译：用于文档分类的分层自我关注混合稀疏网络
4. Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks [C] . Zhu Zhang, Zhou Zhao, Zhijie Lin, International Joint Conference on Artificial Intelligence . 2020

机译：通过分层卷积自我关注网络回答开放式长型视频问题
5. Efficient memory hierarchy designs for chip multiprocessor and network processors. [D] . Huang, Zhuo. 2010

机译：芯片多处理器和网络处理器的高效内存层次设计。
6. Learning Additional Languages as Hierarchical ProbabilisticInference: Insights From First Language Processing [O] . Bozena Pajak, Alex B. Fine, Dave F. Kleinschmidt, -1

机译：学习其他语言作为分层概率推论：来自母语处理的见解
7. Hierarchical Self-Attention Hybrid Sparse Networks for Document Classification [O] . Weichun Huang, Ziqiang Tao, Xiaohui Huang, 2021

机译：用于文档分类的分层自我关注混合稀疏网络

Self-Attention Networks Can Process Bounded Hierarchical Languages

摘要

著录项

相似文献

相关主题

期刊订阅