【24h】

Self-Attention Networks Can Process Bounded Hierarchical Languages

机译:自我关注网络可以处理有界分层语言

获取原文

摘要

Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as Dyck_κ, the language consisting of well-nested parentheses of κ types. This suggested that natural language can be approximated well with models that are too weak for formal languages, or that the role of hierarchy and recursion in natural language might be limited. We qualify this implication by proving that self-attention networks can process Dyck_(κ,D), the subset of Dyck_κ with depth bounded by D, which arguably better captures the bounded hierarchical structure of natural language. Specifically, we construct a hard-attention network with D + 1 layers and O(log κ) memory size (per token per layer) that recognizes Dyck_(κ,D), and a soft-attention network with two layers and O(log k) memory size that generates Dyck_(κ,D). Experiments show that self-attention networks trained on Dyck_(κ, D) generalize to longer inputs with near-perfect accuracy, and also verify the theoretical memory advantage of self-attention networks over recurrent networks.
机译:尽管在NLP中令人印象深刻的性能,但最近被证明是对处理具有分层结构的正式语言的自我关注网络,例如Dyck_κ,由κ类型纯净的括号括号组成的语言。这表明自然语言可以很好地用太弱的模型对正式语言的模型近似,或者层次结构的作用和自然语言中的递归可能有限。我们通过证明自我关注网络可以处理Dyck_(κ,d),Dyck_κ的子集,Dyck_k的界定的Dyck_κ的子集可以说是更好地捕获自然语言的有界分层结构。具体而言,我们用D + 1层和O(logκ)内存大小(每层令牌)构建一个难以注意的网络,识别Dyck_(κ,D)和带有两层和O的软关注网络(日志k)生成Dyck_(κ,d)的内存大小。实验表明,在Dyck_(κ,d)上训练的自我关注网络概括为具有近乎完美的准确性的更长的输入,并且还验证了自我关注网络在经常性网络上的理论记忆优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号