首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >On the Ability and Limitations of Transformers to Recognize Formal Languages
【24h】

On the Ability and Limitations of Transformers to Recognize Formal Languages

机译:论变形金刚识别正式语言的能力和限制

获取原文

摘要

Transformers have supplanted recurrent models in a large number of NLP tasks. However, the differences in their abilities to model different syntactic properties remain largely unknown. Past works suggest that LSTMs generalize very well on regular languages and have close connections with counter languages. In this work, we systematically study the ability of Transformers to model such languages as well as the role of its individual components in doing so. We first provide a construction of Transformers for a subclass of counter languages, including well-studied languages such as n-ary Boolean Expressions, Dyck-1, and its generalizations. In experiments, we find that Transformers do well on this subclass, and their learned mechanism strongly correlates with our construction. Perhaps surprisingly, in contrast to LSTMs, Transformers do well only on a subset of regular languages with degrading performance as we make languages more complex according to a well-known measure of complexity. Our analysis also provides insights on the role of self-attention mechanism in modeling certain behaviors and the inlluence of positional encoding schemes on the learning and generalization abilities of the model.
机译:变压器在大量NLP任务中占用了经常性模型。然而,模拟不同句法特性的能力的差异很大程度上是未知的。过去的工作表明,LSTMS以常规语言概括得很好,并与计量语言密切相关。在这项工作中,我们系统地研究了变形金刚模拟此类语言的能力以及其各个组件在此过程中的作用。我们首先为计数语言的子类提供了变压器的构造,包括学习的语言,如N-ARY布尔表达式,DYCK-1及其概括。在实验中,我们发现变形金刚在这个子类上做得很好,他们的学习机制与我们的建筑强烈关联。也许令人惊讶的是,与LSTM相比,变形金刚只在常规语言的子集中做得很好,因为我们根据众所周知的复杂性衡量语言更复杂。我们的分析还提供了对自我关注机制在模拟某些行为的作用和位置编码方案的概念上的见解,并对模型的学习和泛化能力进行了位置编码方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号