首页> 外文会议>European Signal Processing Conference >Adieu recurrence? End-to-end speech emotion recognition using a context stacking dilated convolutional network
【24h】

Adieu recurrence? End-to-end speech emotion recognition using a context stacking dilated convolutional network

机译:Adieu复发?使用上下文堆叠扩张卷积网络的端到端语音情感识别

获取原文

摘要

In state-of-the-art end-to-end Speech Emotion Recognition (SER) systems, Convolutional Neural Network (CNN) layers are typically used to extract affective features while Long Short-Term Memory (LSTM) layers model long-term temporal dependencies. However, these systems suffer from several problems: 1) the model largely ignores temporal structure in speech due to the limited receptive field of the CNN layers, 2) the model inherits the drawbacks of Recurrent Neural Network (RNN)s, e.g. the gradient exploding/vanishing problem, the polynomial growth of computation time with the input sequence length and the lack of parallelizability. In this work, we propose a novel end-to-end SER structure that does not contain any recurrent or fully connected layers. By levering the power of the dilated causal convolution, the receptive field of the proposed model largely increases with reasonably low computational cost. By also using context stacking, the proposed model is capable of exploiting long-term temporal dependencies and can be an alternative to RNN. Experiments on the RECOLA database publicly available partition show improved results compare to a state-of-the-art system. We also verify that both the proposed model and the state-of-the-art model learned from short sequences (i.e.20s) can make accurate predictions for very long sequences (e.g. ≥ 75s).
机译:在最先进的端到端语音情绪识别(SER)系统中,卷积神经网络(CNN)层通常用于提取情感特征,而长期内存(LSTM)层模型长期时间依赖关系。然而,这些系统遭受了若干问题:1)模型由于CNN层的有限接收领域,2)模型继承了经常性神经网络(RNN)S的缺点,这是言论的语音。梯度爆炸/消失问题,计算时间的多项式生长,输入序列长度和缺乏并行化。在这项工作中,我们提出了一种新的端到端SER结构,不包含任何复发或完全连接的层。通过利用扩张的因果卷积的力量,所提出的模型的接受领域主要增加,计算成本很低。通过使用上下文堆叠,所提出的模型能够利用长期时间依赖性,并且可以是RNN的替代方法。 Recola数据库的实验公开可用的分区显示出与最先进的系统相比的改进结果。我们还验证所提出的模型和从短序列中学到的最先进的模型(即,即20S)可以对非常长的序列进行准确的预测(例如≥75s)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号