首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >Frequency-domain maximum likelihood estimation for automatic speechrecognition in additive and convolutive noises
【24h】

Frequency-domain maximum likelihood estimation for automatic speechrecognition in additive and convolutive noises

机译:累加和卷积噪声中自动语音识别的频域最大似然估计

获取原文
获取原文并翻译 | 示例
       

摘要

A feature estimation technique is proposed for speech signals that are degraded by both additive and convolutive noises. An EM algorithm is formulated in the frequency-domain for identification of the magnitude response of the distortion channel and power spectrum of additive noise, and posterior estimates of short-time power spectra of speech are obtained based on the identified channel and noise. The estimated posterior power spectra are used to calculate perceptually-based linear prediction cepstral coefficients, and the estimated cepstral features and their temporal regression coefficients are used for automatic speech recognition using acoustic models trained from clean speech. Experiments were performed on speaker independent continuous speech recognition, where the speech data were taken from the TIMIT database and were degraded by a distortion channel and simulated additive noises with white or colored spectral characteristics at various SNR levels. Experimental results indicate that the proposed technique leads to convergent identification of channel and noise and significantly improved recognition accuracy for speaker-independent continuous speech
机译:提出了一种针对语音信号的特征估计技术,该语音信号会由于加性和卷积性噪声而退化。在频域中提出了一种EM算法,用于识别失真通道的幅度响应和加性噪声的功率谱,并基于识别出的通道和噪声获得语音的短期功率谱的后验估计。估计的后验功率谱用于计算基于感知的线性预测倒谱系数,并且估计的倒谱特征及其时间回归系数用于使用从纯语音训练的声学模型进行自动语音识别。对说话者独立的连续语音识别进行了实验,其中语音数据来自TIMIT数据库,并通过失真通道和模拟的加性噪声​​在各种SNR级别具有白色或彩色频谱特性而退化。实验结果表明,所提出的技术可以对信道和噪声进行收敛性识别,并大大提高了与说话者无关的连续语音的识别精度

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号