首页> 外文会议>Conference on empirical methods in natural language processing >Length bias in Encoder Decoder Models and a Case for Global Conditioning
【24h】

Length bias in Encoder Decoder Models and a Case for Global Conditioning

机译:编码器解码器模型中的长度偏差和全局条件的情况

获取原文

摘要

Encoder-decoder networks are popular for modeling sequences probabilistically in many applications. These models use the power of the Long Short-Term Memory (LSTM) architecture to capture the full dependence among variables, unlike earlier models like CRFs that typically assumed conditional independence among non-adjacent variables. However in practice encoder-decoder models exhibit a bias towards short sequences that surprisingly gets worse with increasing beam size. In this paper we show that such phenomenon is due to a discrepancy between the full sequence margin and the per-element margin enforced by the locally conditioned training objective of a encoder-decoder model. The discrepancy more adversely impacts long sequences, explaining the bias towards predicting short sequences. For the case where the predicted sequences come from a closed set, we show that a globally conditioned model alleviates the above problems of encoder-decoder models. From a practical point of view, our proposed model also eliminates the need for a beam-search during inference, which reduces to an efficient dot-product based search in a vector-space.
机译:编码器-解码器网络在许多应用中普遍用于概率建模。这些模型利用长短期记忆(LSTM)架构的功能来捕获变量之间的完全依赖关系,这与早期的模型(例如CRF)通常假定非相邻变量之间具有条件独立性不同。然而,在实践中,编码器-解码器模型表现出对短序列的偏爱,随着波束大小的增加,这令人惊讶地变得更糟。在本文中,我们证明了这种现象是由于编码器-解码器模型的局部条件训练目标所强制的完整序列余量和每个元素余量之间存在差异。差异对长序列的影响更大,解释了对预测短序列的偏见。对于预测序列来自封闭集的情况,我们证明了全局条件模型减轻了编码器-解码器模型的上述问题。从实际的角度来看,我们提出的模型还消除了在推理过程中进行波束搜索的需求,从而减少了向量空间中基于点积的有效搜索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号