【24h】

Tree Transformer: Integrating Tree Structures into Self-Attention

机译:树形变压器:将树形结构整合到自我关注中

获取原文

摘要

Pre-training Transformer from large-scale raw texts and fine-tuning on the desired task have achieved state-of-the-art results on diverse NLP tasks. However, it is unclear what the learned attention captures. The attention computed by attention heads seems not to match human intuitions about hierarchical structures. This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures. The tree structures can be automatically induced from raw texts by our proposed "Constituent Attention" module, which is simply implemented by self-attention between two adjacent words. With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning more explainable attention scores~1.
机译:来自大规模原始文本的预训练Transformer以及对所需任务的微调,已在各种NLP任务上取得了最新的成果。但是,尚不清楚所学注意力吸引了什么。注意头计算出的注意似乎与人类对层次结构的直觉不匹配。本文提出了Tree Transformer,它为双向Transformer编码器的关注头添加了额外的约束,以鼓励关注头遵循树形结构。可以通过我们提出的“组成注意”模块从原始文本自动导出树结构,该模块可以通过两个相邻单词之间的自注意来简单地实现。通过与BERT相同的训练过程,实验证明了Tree Transformer在诱导树结构,更好的语言建模以及进一步学习更多可解释的注意力得分方面的有效性〜1。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号