首页> 外文期刊>IEEE transactions on audio, speech and language processing >Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor
【24h】

Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor

机译:使用倒谱最小均方误差动机降噪器的鲁棒语音识别

获取原文
获取原文并翻译 | 示例

摘要

We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (Eu00026;M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank''''s output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the Eu00026;M log-MMSE suppressor. Our algorithm is significantly more efficient than the Eu00026;M''''s log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular Eu00026;M''''s log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.
机译:我们提出了一种有效且有效的非线性特征域噪声抑制算法,该算法受最小均方误差(MMSE)优化标准的激励,用于噪声鲁棒的语音识别。与Ephraim和Malah(Eu00026; M)提出的对数MMSE频谱幅度噪声抑制器不同,我们的新算法旨在最小化明确为梅尔频率倒谱表示的误差,而不是离散傅立叶变换(DFT)频谱,在Mel频率滤波器组的输出上运行。结果,用于估计抑制因子的统计数据与Eu00026; M log-MMSE抑制器中使用的统计数据大大不同。我们的算法比Eu00026; M'的对数MMSE抑制器有效得多,因为梅尔频率滤波器组中的通道数比箱数(256)小得多(在我们的示例中为23个)在DFT中。我们已经对标准Aurora-3任务进行了广泛的语音识别实验。实验结果表明,与标准ICSLP02基线相比,识别单词错误率降低了48%,与倒频谱平均归一化基线相比降低了26%,与流行的Eu00026; M的log-MMSE噪声抑制器相比降低了13%。实验还表明,在完全匹配和中等不匹配的设置下,我们的新算法的性能比ETSI高级前端(AFE)略好,并且比我们以前的SPLICE(基于立体声的分段线性算法)的错误率降低了8%和10%这些设置分别用于环境补偿)系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号