首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search
【24h】

An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search

机译:使用连续放松对梁搜索的经常性神经序列模型全球和局部归一化的实证研究

获取原文

摘要

Globally normalized neural sequence models arc considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practical advantage of global normalization in the context of modern neural methods remains unclear. In this paper, we attempt to shed light on this problem through an empirical study. We extend an approach for search-aware training via a continuous relaxation of beam search (Goyal et al., 2017b) in order to enable training of globally normalized recurrent sequence models through simple backpropagation. We then use this technique to conduct an empirical study of the interaction between global normalization, high-capacity encoders, and search-aware optimization. We observe that in the context of inexact search, globally normalized neural models are still more effective than their locally normalized counterparts. Further, since our training approach is sensitive to warm-starting with pre-trained models, we also propose a novel initialization strategy based on self-normalization for pre-training globally normalized models. We perform analysis of our approach on two tasks: CCG supertagging and Machine Translation, and demonstrate the importance of global normalization under different conditions while using search-aware training.
机译:全球归一化神经序列模型被认为优于其局部标准化的等价物,因为它们可能会改善标签偏差的影响。然而,在考虑整个输入序列上的条件的高容量神经参数化时,在它们能够代表的分布方面,两个模型类都是等效的。因此,在现代神经方法的背景下全球正常化的实际优势仍不清楚。在本文中,我们试图通过实证研究来阐明这个问题。我们通过梁搜索(Goyal等,2017b)的连续放松来扩展一种用于搜索感知培训的方法,以便通过简单的Backpropagation实现全局规范化的经常性序列模型的培训。然后,我们使用该技术进行全局归一化,高容量编码器和搜索感知优化之间的相互作用的实证研究。我们观察到,在不精确的搜索范围内,全球规范化的神经模型仍然比其局部规范化的对应更有效。此外,由于我们的培训方法对预训练模型的热情敏感,因此我们还提出了一种基于自我正常化的新颖初始化策略,以进行预先培训全球规范化模型。我们在两项任务中对我们的方法进行分析:CCG超牌和机器翻译,并在使用搜索感知培训时展示了不同条件下的全球正常化的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号