...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering
【24h】

Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering

机译:使用深度降噪自动编码器和逆滤波的低语语音识别

获取原文
获取原文并翻译 | 示例
           

摘要

Due to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition (ASR) systems trained on neutral speech degrades significantly when whisper is applied. In order to deeply analyze this mismatched train/test situation and to develop an efficient way for whisper recognition, this study first analyzes acoustic characteristics of whispered speech, addresses the problems of whispered speech recognition in mismatched conditions, and then proposes a new robust cepstral features and preprocessing approach based on deep denoising autoencoder (DDAE) that enhance whisper recognition. The experimental results confirm that Teager-energy-based cepstral features, especially TECCs, are more robust and better whisper descriptors than traditional Mel-frequency cepstral coefficients (MFCC). Further detailed analysis of cepstral distances, distributions of cepstral coefficients, confusion matrices, and experiments with inverse filtering, prove that voicing in speech stimuli is the main cause of word misclassification in mismatched train/test scenarios. The new framework based on DDAE and TECC feature, significantly improves whisper recognition accuracy and outperforms traditional MFCC and GMM-HMM (Gaussian mixture density-Hidden Markov model) baseline, resulting in an absolute 31% improvement of whisper recognition accuracy. The achieved word recognition rate in neutral/whisper scenario is 92.81%.
机译:由于中性和耳语语音的声学特性之间存在巨大差异,所以在应用耳语时,在中性语音上训练的传统自动语音识别(ASR)系统的性能会大大降低。为了深入分析这种不匹配的训练/测试情况并开发一种有效的耳语识别方法,本研究首先分析了耳语语音的声学特性,解决了在不匹配条件下耳语语音识别的问题,然后提出了一种新的鲁棒倒频谱特征基于深度降噪自动编码器(DDAE)的增强耳语识别的预处理方法。实验结果证实,与传统的梅尔频率倒谱系数(MFCC)相比,基于Teager能量的倒谱特征(尤其是TECC)具有更强的鲁棒性和更好的耳语描述符。进一步详细分析了倒谱距离,倒谱系数分布,混淆矩阵以及逆滤波实验,证明语音刺激中的发声是火车/测试场景不匹配时单词分类错误的主要原因。基于DDAE和TECC功能的新框架显着提高了耳语识别的准确性,并且优于传统的MFCC和GMM-HMM(高斯混合密度隐马尔可夫模型)基线,从而使耳语识别的准确性绝对提高了31%。在中性/低语环境下,单词识别率达到92.81%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号