【24h】

A Front-End Technique for Automatic Noisy Speech Recognition

机译:一种自动嘈杂语音识别的前端技术

获取原文
获取外文期刊封面目录资料

摘要

The sounds in a real environment not often take place in isolation because sounds are building complex and usually happen concurrently. Auditory masking relates to the perceptual interaction between sound components. This paper proposes modeling the effect of simultaneous masking into the Mel frequency cepstral coefficient (MFCC) and effectively improve the performance of the resulting system. Moreover, the Gammatone frequency integration is presented to warp the energy spectrum which can provide gradually decaying the weights and compensate for the loss of spectral correlation. Experiments are carried out on the Aurora-2 database, and frame-level cross entropy-based deep neural network (DNN-HMM) training is used to build an acoustic model. While given models trained on multi-condition speech data, the accuracy of our proposed feature extraction method achieves up to 98.14% in case of 10dB, 94.40% in 5dB, 81.67% in 0dB and 51.5% in −5dB, respectively.
机译:真实环境中的声音通常在隔离中经常发生,因为声音正在构建复杂并且通常同时发生。听觉掩模涉及声音组件之间的感知相互作用。本文提出了对MEL频率谱系数(MFCC)同时掩蔽的影响,有效地提高了所得系统的性能。此外,呈现γ频率集成以横跨能量谱进行扫描,该能谱可以提供逐渐衰减的权重,并补偿光谱相关的损失。实验在Aurora-2数据库上进行,并且使用帧级跨熵的深神经网络(DNN-HMM)训练来构建声学模型。虽然给定的模型在多条件语音数据上培训时,我们所提出的特征提取方法的准确性在10dB的情况下达到98.14%,分别在5dB的10dB,94.40%,81.67%,分别为-5dB的51.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号