首页> 外文会议>International Conference on Communications, Information System and Computer Engineering >Robust Speech Emotion Recognition for Sindhi Language based on Deep Convolutional Neural Network
【24h】

Robust Speech Emotion Recognition for Sindhi Language based on Deep Convolutional Neural Network

机译:基于深度卷积神经网络的信徒语言强大的语音情感认展

获取原文

摘要

Speech emotion recognition (SER) is a challenging task since the expression of emotions is distinct in different languages. This paper proposes a robust SER approach that focuses on improving the robust performance of SER for low resource languages such as Sindhi. To the best of our knowledge, this is the first SER work on the Sindhi language that utilizes data augmentation (DA) and deep learning techniques. The proposed method first uses the optimally modified log-spectral amplitude estimator (OMLSA), to suppress the noise in speech data. Secondly, to deal with the imbalance or limited dataset of low resource languages, DA technique based on a combination of different prosodic features (i.e., time-stretching, pitch, and white noise) is proposed. Then, based on the proposed 1-dimensional Convolution Neural Network (1DCNN) model the SER is achieved. We contribute further by introducing our novel Sindhi speech emotion dataset (NSSED) consisting of as many as 1231 audio files categorized into four emotions (i.e., happy, sad, angry, and neutral). To demonstrate the superior performance and cross-lingual adaptability of the proposed method, it is compared with two other methods, i.e., the support vector machine (SVM) and the long short-term memory (LSTM) for both NSSED and Urdu language datasets. Experimental results demonstrate that our proposed method achieves up to 91% and 88% accuracy for NSSED and Urdu datasets respectively which is an increase of approx. 22% over the baseline model.
机译:语音情感认可(Ser)是一个具有挑战性的任务,因为情绪的表达在不同的语言中是不同的。本文提出了一种强大的SER方法,侧重于提高SER为Sindhi等低资源语言的强大性能。据我们所知,这是Sindhi语言的第一个SER工作,它利用数据增强(DA)和深度学习技术。所提出的方法首先使用最佳地修改的日志频率幅度估计器(OMLSA)来抑制语音数据中的噪声。其次,为了应对低资源语言的不平衡或有限数据集,基于不同韵律特征的组合(即时拉,沥青和白噪声)的DA技术。然后,基于所提出的1维卷积神经网络(1dcnn)模型实现了Ser。我们通过介绍我们的小说中的SINDHI语音情感数据集(NSSED)来贡献,该数据集由多达1231个音频文件分为四种情绪(即,快乐,悲伤,生气和中立)。为了展示所提出的方法的卓越性能和交叉语言适应性,将其与另外两种方法进行比较,即支持向量机(SVM)和用于NSSED和URDU语言数据集的长期短期存储器(LSTM)。实验结果表明,我们的拟议方法分别实现了NSSED和URDU数据集的高达91%和88%的准确性,这增加了约。基线模型的22%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号