首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search
【24h】

An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search

机译:使用连续松弛的波束搜索对递归神经序列模型进行全局和局部归一化的实证研究

获取原文

摘要

Globally normalized neural sequence models arc considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practical advantage of global normalization in the context of modern neural methods remains unclear. In this paper, we attempt to shed light on this problem through an empirical study. We extend an approach for search-aware training via a continuous relaxation of beam search (Goyal et al., 2017b) in order to enable training of globally normalized recurrent sequence models through simple backpropagation. We then use this technique to conduct an empirical study of the interaction between global normalization, high-capacity encoders, and search-aware optimization. We observe that in the context of inexact search, globally normalized neural models are still more effective than their locally normalized counterparts. Further, since our training approach is sensitive to warm-starting with pre-trained models, we also propose a novel initialization strategy based on self-normalization for pre-training globally normalized models. We perform analysis of our approach on two tasks: CCG supertagging and Machine Translation, and demonstrate the importance of global normalization under different conditions while using search-aware training.
机译:全局归一化的神经序列模型被认为优于其局部归一化的等价物,因为它们可以改善标签偏倚的影响。但是,当考虑以整个输入序列为条件的高容量神经参数设置时,两个模型类在它们能够表示的分布方面在理论上是等效的。因此,在现代神经方法的背景下,全局归一化的实际优势仍然不清楚。在本文中,我们试图通过实证研究来阐明这个问题。我们通过连续放宽波束搜索来扩展搜索搜索训练的方法(Goyal et al。,2017b),以便能够通过简单的反向传播训练全局归一化的递归序列模型。然后,我们使用此技术对全局归一化,大容量编码器和搜索感知优化之间的相互作用进行实证研究。我们观察到,在不精确搜索的情况下,全局归一化的神经模型仍然比局部归一化的神经模型更有效。此外,由于我们的训练方法对预训练模型的热启动敏感,因此我们还针对预训练全局归一化模型提出了一种基于自归一化的新颖初始化策略。我们对以下两个任务进行方法分析:CCG超级标记和机器翻译,并在使用搜索感知训练的同时证明了在不同条件下进行全局标准化的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号