Automatic emotion recognition in noisy, coded and narrow-band speech

机译：在嘈杂，编码和窄带语音中自动识别情感

页面导航

摘要
著录项
相似文献
相关主题

摘要

This thesis addresses an important research gap regarding effects of real-life conditions including coded, narrow-band and noisy speech signals on automatic emotion recognition (AER) from speech signals. In addition, the study aims to research efficient methods of reducing possible detrimental effects of speech signals compression on AER. The thesis consists of two parts. The first part investigates the effects of noise, data compression and bandwidth reduction on AER from speech signals. The second part investigates application of AER based on speech spectrograms (SS) and the Artificial Bandwidth Extension (ABE) to improve the robustness and accuracy of emotion recognition from speech signals under these potentially undesirable conditions. Effects of adaptive multi-rates (AMR), adaptive multi-rate wideband (AMR-WB) and extended adaptive multi-rate wideband (AMR-WB+) and MP3 speech compression methods are compared against emotion recognition from uncompressed speech. Noisy conditions are simulated using Gaussian white noise added to speech signals at different values of signal to noise ratio (SNR). Band reduction is tested using speech filtering. The AER methods include techniques based on acoustic speech parameters including: mel-frequency cepstral coefficients (MFCCs), Teager energy operator and perceptual wavelet packet (TEO-PWP) features, glottal time and frequency domain features (GP-T&GP-F), as well as, spectrogram image (SS) parameters, spectrogram critical band scale (SS-CB) and spectrogram bark scale (SS-Bark). The modelling of acoustic classes is based on the Gaussian Mixture Mode (GMM) and all experiments use the same Berlin Emotional Speech database. The ABE of narrow band speech is performed using spectral folding and spectral envelope estimation methods. The major findings described in this thesis indicate that: 1. Standard speech compression methods such as AMR, AMR-WB, AMR-WB+ and MP3 have a significant effect on the (AER), and in general lead to significant degradation of AER accuracy. 2. Low-frequency components (0 kHz to 1 kHz) of speech containing the fundamental frequency information, as well as, high-frequency components (above 4 kHz) have a key effect on the accuracy of SER. 3. Significant reduction of AER accuracy was observed for uncompressed speech modified in a way simulating a typical mild-to-moderate high frequency hearing loss. This accuracy was further reduced when the modified speech was compressed. 4. Addition of noise to either uncompressed or compressed speech reduces accuracy of AER. It was shown that the best performing under noisy conditions features were MFCCs and the best performing speech compression algorithms was AMR-WB. 5. Detrimental effects of speech compression can be mitigated using AER based on speech spectrogram features. 6. By extending the narrow-band of AMR-compressed speech an improvement of AER accuracy can be achieved.

机译：本论文解决了一个重要的研究空白，涉及现实生活条件（包括编码，窄带和嘈杂的语音信号）对语音信号中自动情感识别（AER）的影响。此外，该研究旨在研究减少语音信号压缩对AER可能产生的有害影响的有效方法。论文分为两部分。第一部分研究了语音信号中噪声，数据压缩和带宽减少对AER的影响。第二部分研究了基于语音频谱图（SS）和人工带宽扩展（ABE）的AER在这些潜在不良条件下提高语音信号情感识别的鲁棒性和准确性。比较了自适应多速率（AMR），自适应多速率宽带（AMR-WB）和扩展自适应多速率宽带（AMR-WB +）和MP3语音压缩方法与未压缩语音的情感识别效果。使用添加到语音信号的高斯白噪声以不同的信噪比（SNR）值来模拟噪声条件。使用语音过滤测试带宽降低。 AER方法包括基于语音参数的技术，这些参数包括：mel频率倒谱系数（MFCC），Teager能量算子和感知小波包（TEO-PWP）特征，声门时域和频域特征（GP-T＆amp; GP-F），以及频谱图图像（SS）参数，频谱图临界带尺度（SS-CB）和频谱图树皮尺度（SS-Bark）。声学类别的建模基于高斯混合模式（GMM），并且所有实验都使用相同的柏林情感语音数据库。窄带语音的ABE使用频谱折叠和频谱包络估计方法执行。本论文描述的主要发现表明：1.标准语音压缩方法，例如AMR，AMR-WB，AMR-WB +和MP3对（AER）产生重大影响，并且总体上会导致AER准确性的显着下降。 2.包含基本频率信息的语音的低频成分（0 kHz至1 kHz）以及高频成分（4 kHz以上）对SER的准确性有关键影响。 3.对于以模拟典型的轻度到中度高频听力损失的方式修改的未压缩语音，观察到AER准确性显着降低。当修改的语音被压缩时，此准确性进一步降低。 4.将噪声添加到未压缩或压缩的语音中会降低AER的准确性。结果表明，在嘈杂条件下，性能最佳的是MFCC，而语音压缩算法的最佳是AMR-WB。 5.基于语音频谱图特征的AER可以减轻语音压缩的不利影响。 6.通过扩展AMR压缩语音的窄带，可以实现AER精度的提高。

著录项

作者
Albahri A;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Auditory driven subband speech enhancement for automatic recognition of noisy speech [J] . Navneet Upadhyay, Hamurabi Gamboa Rosales International journal of speech technology . 2016,第4期

机译：听觉驱动的子带语音增强功能可自动识别嘈杂的语音
2. Emotion Recognition System of Noisy Speech in Real World Environment [J] . Htwe Pa Pa Win, Phyo Thu Thu Khine International Journal of Image, Graphics and Signal Processing . 2020,第2期

机译：现实世界环境中嘈杂讲话的情感识别系统
3. Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning [J] . Jiang Xiaoqing, Xia Kewen, Lin Yongliang, 中国邮电高校学报（英文版） . 2017,第002期

机译：使用样本重建和多核学习进行嘈杂的语音情感识别
4. Artificial bandwidth extension to improve automatic emotion recognition from narrow-band coded speech [C] . Abas Albahri, Catherine S Rodriguez, Margaret Lech International Conference on Signal Processing and Communication Systems . 2016

机译：人工带宽扩展可改善从窄带编码语音中自动识别情绪的能力
5. Learning discriminant narrow-band temporal patterns for automatic recognition of conversational telephone speech. [D] . Chen, Barry Yue. 2005

机译：学习可分辨的窄带时间模式，以自动识别会话电话语音。
6. Cascaded Convolutional Neural Network Architecture for Speech Emotion Recognition in Noisy Conditions [O] . Youngja Nam, Chankyu Lee 2021

机译：级联卷积神经网络架构用于嘈杂的条件下的语音情感识别
7. SPEECH FEATURE DENOISING AND DEREVERBERATION VIA DEEP AUTOENCODERS FOR NOISY REVERBERANT SPEECH RECOGNITION [O] . Xue Feng, Yaodong Zhang, James Glass 2014

机译：通过深度自动调节器进行语音特征去噪和降级以进行噪音混响语音识别

Automatic emotion recognition in noisy, coded and narrow-band speech

摘要

著录项

相似文献

相关主题

期刊订阅