Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering

Đorđe T. Grozdić; Slobodan T. Jovičić

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering

【24h】

Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering

机译：使用深度降噪自动编码器和逆滤波的低语语音识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition (ASR) systems trained on neutral speech degrades significantly when whisper is applied. In order to deeply analyze this mismatched train/test situation and to develop an efficient way for whisper recognition, this study first analyzes acoustic characteristics of whispered speech, addresses the problems of whispered speech recognition in mismatched conditions, and then proposes a new robust cepstral features and preprocessing approach based on deep denoising autoencoder (DDAE) that enhance whisper recognition. The experimental results confirm that Teager-energy-based cepstral features, especially TECCs, are more robust and better whisper descriptors than traditional Mel-frequency cepstral coefficients (MFCC). Further detailed analysis of cepstral distances, distributions of cepstral coefficients, confusion matrices, and experiments with inverse filtering, prove that voicing in speech stimuli is the main cause of word misclassification in mismatched train/test scenarios. The new framework based on DDAE and TECC feature, significantly improves whisper recognition accuracy and outperforms traditional MFCC and GMM-HMM (Gaussian mixture density-Hidden Markov model) baseline, resulting in an absolute 31% improvement of whisper recognition accuracy. The achieved word recognition rate in neutral/whisper scenario is 92.81%.

机译：由于中性和耳语语音的声学特性之间存在巨大差异，所以在应用耳语时，在中性语音上训练的传统自动语音识别（ASR）系统的性能会大大降低。为了深入分析这种不匹配的训练/测试情况并开发一种有效的耳语识别方法，本研究首先分析了耳语语音的声学特性，解决了在不匹配条件下耳语语音识别的问题，然后提出了一种新的鲁棒倒频谱特征基于深度降噪自动编码器（DDAE）的增强耳语识别的预处理方法。实验结果证实，与传统的梅尔频率倒谱系数（MFCC）相比，基于Teager能量的倒谱特征（尤其是TECC）具有更强的鲁棒性和更好的耳语描述符。进一步详细分析了倒谱距离，倒谱系数分布，混淆矩阵以及逆滤波实验，证明语音刺激中的发声是火车/测试场景不匹配时单词分类错误的主要原因。基于DDAE和TECC功能的新框架显着提高了耳语识别的准确性，并且优于传统的MFCC和GMM-HMM（高斯混合密度隐马尔可夫模型）基线，从而使耳语识别的准确性绝对提高了31％。在中性/低语环境下，单词识别率达到92.81％。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2017年第12期|2313-2322|共10页
作者
Đorđe T. Grozdić; Slobodan T. Jovičić;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Biology; Speech recognition; Inverse filtering; Automatic speech recognition; Noise levels; Encoding; Mel frequency cepstral coefficient; Character recognition;

机译：生物学;语音识别;逆滤波;自动语音识别;噪声级;编码;梅尔倒谱系数;字符识别;

相似文献

外文文献
中文文献
专利

1. Whispered speech recognition using deep denoising autoencoder [J] . Dorđe T. Grozdić, Slobodan T. Jovičić, Miško Subotić Engineering Applications of Artificial Intelligence . 2017,第mara期

机译：使用深度降噪自动编码器的低语语音识别
2. Single-channel Dereverberation for Distant-Talking Speech Recognition by Combining Denoising Autoencoder and Temporal Structure Normalization [J] . Ueda Yuma, Wang Longbiao, Kai Atsuhiko, Journal of signal processing systems for signal, image, and video technology . 2016,第2期

机译：结合去噪自动编码器和时间结构归一化的单通道去混响用于远距离语音识别
3. Environment-dependent denoising autoencoder for distant-talking speech recognition [J] . Yuma Ueda, Longbiao Wang, Atsuhiko Kai, EURASIP journal on advances in signal processing . 2015,第1期

机译：依赖环境的去噪自动编码器，用于远距离语音识别
4. Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition [C] . Feng Xue, Zhang Yaodong, Glass James IEEE International Conference on Acoustics, Speech and Signal Processing . 2014

机译：通过深度自动编码器对语音特征进行去噪和去混响，以实现嘈杂的混响语音识别
5. Analysis and modeling for robust whispered speech recognition. [D] . Ghaffarzadegan, Shabnam. 2016

机译：强大的耳语识别功能的分析和建模。
6. Denoising Autoencoder A Deep Learning Algorithm Aids the Identification of A Novel Molecular Signature of Lung Adenocarcinoma [O] . Jun Wang, Xueying Xie, Junchao Shi, 2020

机译：去噪一种深入学习算法有助于鉴定肺腺癌的新分子签名
7. SPEECH FEATURE DENOISING AND DEREVERBERATION VIA DEEP AUTOENCODERS FOR NOISY REVERBERANT SPEECH RECOGNITION [O] . Xue Feng, Yaodong Zhang, James Glass 2014

机译：通过深度自动调节器进行语音特征去噪和降级以进行噪音混响语音识别

Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering

摘要

著录项

相似文献

相关主题

期刊订阅