首页> 外文会议>Conference on empirical methods in natural language processing >Length bias in Encoder Decoder Models and a Case for Global Conditioning
【24h】

Length bias in Encoder Decoder Models and a Case for Global Conditioning

机译:编码器解码器模型中的长度偏差以及全球调理的情况

获取原文

摘要

Encoder-decoder networks are popular for modeling sequences probabilistically in many applications. These models use the power of the Long Short-Term Memory (LSTM) architecture to capture the full dependence among variables, unlike earlier models like CRFs that typically assumed conditional independence among non-adjacent variables. However in practice encoder-decoder models exhibit a bias towards short sequences that surprisingly gets worse with increasing beam size. In this paper we show that such phenomenon is due to a discrepancy between the full sequence margin and the per-element margin enforced by the locally conditioned training objective of a encoder-decoder model. The discrepancy more adversely impacts long sequences, explaining the bias towards predicting short sequences. For the case where the predicted sequences come from a closed set, we show that a globally conditioned model alleviates the above problems of encoder-decoder models. From a practical point of view, our proposed model also eliminates the need for a beam-search during inference, which reduces to an efficient dot-product based search in a vector-space.
机译:编码器 - 解码器网络是在许多应用中建模概率的序列的流行。这些模型使用长短期内存(LSTM)架构的功率来捕获变量之间的完全依赖性,与通常假定非相邻变量之间的条件独立性的CRF等型号不同的依赖性。然而,在实践中,编码器 - 解码器模型表现出朝向短序列的偏差,令人惊讶地与增加的光束尺寸变差。在本文中,我们表明,这种现象是由于由编码器 - 解码器模型的本地条件训练目标强制实施的全序余量和每个元素边缘之间的差异。差异更加不利地影响长序列,解释朝向预测短序列的偏差。对于预测序列来自封闭式集合的情况,我们表明全球调节模型减轻了上述编码器解码器模型的问题。从实际的角度来看,我们所提出的模型还消除了在推理期间对光束搜索的需求,这减少了在向量空间中基于高效的DOT产品搜索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号