On the Ability and Limitations of Transformers to Recognize Formal Languages

机译：论变形金刚识别正式语言的能力和限制

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Transformers have supplanted recurrent models in a large number of NLP tasks. However, the differences in their abilities to model different syntactic properties remain largely unknown. Past works suggest that LSTMs generalize very well on regular languages and have close connections with counter languages. In this work, we systematically study the ability of Transformers to model such languages as well as the role of its individual components in doing so. We first provide a construction of Transformers for a subclass of counter languages, including well-studied languages such as n-ary Boolean Expressions, Dyck-1, and its generalizations. In experiments, we find that Transformers do well on this subclass, and their learned mechanism strongly correlates with our construction. Perhaps surprisingly, in contrast to LSTMs, Transformers do well only on a subset of regular languages with degrading performance as we make languages more complex according to a well-known measure of complexity. Our analysis also provides insights on the role of self-attention mechanism in modeling certain behaviors and the inlluence of positional encoding schemes on the learning and generalization abilities of the model.

机译：变压器在大量NLP任务中占用了经常性模型。然而，模拟不同句法特性的能力的差异很大程度上是未知的。过去的工作表明，LSTMS以常规语言概括得很好，并与计量语言密切相关。在这项工作中，我们系统地研究了变形金刚模拟此类语言的能力以及其各个组件在此过程中的作用。我们首先为计数语言的子类提供了变压器的构造，包括学习的语言，如N-ARY布尔表达式，DYCK-1及其概括。在实验中，我们发现变形金刚在这个子类上做得很好，他们的学习机制与我们的建筑强烈关联。也许令人惊讶的是，与LSTM相比，变形金刚只在常规语言的子集中做得很好，因为我们根据众所周知的复杂性衡量语言更复杂。我们的分析还提供了对自我关注机制在模拟某些行为的作用和位置编码方案的概念上的见解，并对模型的学习和泛化能力进行了位置编码方案。

著录项

来源
《Conference on Empirical Methods in Natural Language Processing》|2020年|7096-7116|共21页
会议地点
作者
Satwik Bhattamishra; Kabir Ahuja; Navin Goyal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. DECIDABILITY PROPERTIES OF THE CLASS OF FORMAL LANGUAGES RECOGNIZED BY K-EDGE FINITE STATE AUTOMATA [J] . ANUCHIT JITPATTANAKUL Journal of Theoretical and Applied Information Technology . 2017,第24期

机译：K-Edge Unitipte Automata认可的正式语言类别的可解除性属性
2. Numeration systems on a regular language: arithmetic operations, recognizability and formal power series [J] . Rigo M. Theoretical computer science . 2001,第1a2期

机译：常规语言的计数系统：算术运算，可识别性和形式幂序列
3. Limitations in Drivers' Ability to Recognize Pedestrians at Night [J] . Joanne M. Wood, Richard A. Tyrrell, Trent P. Carberry Human Factors . 2005,第3期

机译：驾驶员夜间识别行人的能力受到限制
4. On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages [C] . Satwik Bhattamishra, Kabir Ahuja, Navin Goyal International Conference on Computational Linguistics . 2020

机译：关于经常性神经网络识别分层语言的实践能力
5. Logic, formal languages, and formal language identification. Some logical properties of the languages in the Chomsky hierarchy, and an interrogative model of formal language identification. [D] . Pylkko, Pauli Olavi. 1988

机译：逻辑，形式语言和形式语言标识。乔姆斯基层次结构中语言的某些逻辑属性，以及形式语言标识的疑问模型。
6. Transformers-sklearn: a toolkit for medical language understanding with transformer-based models [O] . Feihong Yang, Xuwen Wang, Hetong Ma, 2021

机译：变换器 - Sklearn：用基于变压器的模型的医疗语言理解的工具包
7. A note on the languages recognized by commutative asynchronous automata (Algebraic Systems, Formal Languages and Computations) [O] . Imreh Balazs, Ito Masami, Pukler Antal 2000

机译：关于可交换异步自动机识别的语言的注释（代数系统，形式语言和计算）

On the Ability and Limitations of Transformers to Recognize Formal Languages

摘要

著录项

相似文献

相关主题

期刊订阅