首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments
【24h】

Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments

机译:嘈杂不利环境下伦巴第效应的语音识别无监督均衡

获取原文
获取原文并翻译 | 示例

摘要

In the presence of environmental noise, speakers tend to adjust their speech production in an effort to preserve intelligible communication. The noise-induced speech adjustments, called Lombard effect (LE), are known to severely impact the accuracy of automatic speech recognition (ASR) systems. The reduced performance results from the mismatch between the ASR acoustic models trained typically on noise-clean neutral (modal) speech and the actual parameters of noisy LE speech. In this study, novel unsupervised frequency domain and cepstral domain equalizations that increase ASR resistance to LE are proposed and incorporated in a recognition scheme employing a codebook of noisy acoustic models. In the frequency domain, short-time speech spectra are transformed towards neutral ASR acoustic models in a maximum-likelihood fashion. Simultaneously, dynamics of cepstral samples are determined from the quantile estimates and normalized to a constant range. A codebook decoding strategy is applied to determine the noisy models best matching the actual mixture of speech and noisy background. The proposed algorithms are evaluated side by side with conventional compensation schemes on connected Czech digits presented in various levels of background car noise. The resulting system provides an absolute word error rate (WER) reduction on 10-dB signal-to-noise ratio data of 8.7% and 37.7% for female neutral and LE speech, respectively, and of 8.7% and 32.8% for male neutral and LE speech, respectively, when compared to the baseline recognizer employing perceptual linear prediction (PLP) coefficients and cepstral mean and variance normalization.
机译:在存在环境噪声的情况下,说话者倾向于调整其语音输出,以保持可理解的交流。已知由噪声引起的语音调整(称为Lombard效应(LE))会严重影响自动语音识别(ASR)系统的准确性。性能下降是由于通常在噪声清洁的中性(模态)语音上训练的ASR声学模型与嘈杂的LE语音的实际参数之间的不匹配造成的。在这项研究中,提出了新的无监督的频域和倒谱域均衡方法,这些方法提高了对LE的ASR抵抗力,并将其并入了采用噪声声学模型码本的识别方案。在频域中,短时语音频谱以最大似然方式转换为中性ASR声学模型。同时,根据分位数估计来确定倒谱样本的动力学并将其归一化到恒定范围。应用码本解码策略来确定与语音和背景噪声的实际混合最匹配的噪声模型。所提出的算法与常规补偿方案并排进行了评估,该补偿方案针对各种背景汽车噪声中呈现的连接的捷克数字进行了评估。结果系统在女性中性和LE语音的10 dB信噪比数据上分别提供了8.7%和37.7%的绝对字误率(WER)降低,对于男性中性和LE语音分别为8.7%和32.8%与使用感知线性预测(PLP)系数和倒谱均值和方差归一化的基线识别器相比,LE语音分别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号