首页> 外文学位 >Compensation for Nonlinear Distortion in Noise for Robust Speech Recognition.
【24h】

Compensation for Nonlinear Distortion in Noise for Robust Speech Recognition.

机译:噪声中的非线性失真补偿,用于鲁棒的语音识别。

获取原文
获取原文并翻译 | 示例

摘要

The performance, reliability, and ubiquity of automatic speech recognition systems has flourished in recent years due to steadily increasing computational power and technological innovations such as hidden Markov models, weighted finite-state transducers, and deep learning methods. One problem which plagues speech recognition systems, especially those that operate offline and have been trained on specific in-domain data, is the deleterious effect of noise on the accuracy of speech recognition. Historically, robust speech recognition research has focused on traditional noise types such as additive noise, linear filtering, and reverberation. This thesis describes the effects of nonlinear dynamic range compression on automatic speech recognition and develops a number of novel techniques for characterizing and counteracting it. Dynamic range compression is any function which reduces the dynamic range of an input signal. Dynamic range compression is a widely-used tool in audio engineering and is almost always a component of a practical telecommunications system. Despite its ubiquity, this thesis is the first work to comprehensively study and address the effect of dynamic range compression on speech recognition.;More specifically, this thesis treats the problem of dynamic range compression in three ways: (1) blind amplitude normalization methods, which counteract dynamic range compression when its parameter values allow the function to be mathematically inverted, (2) blind amplitude reconstruction techniques, i.e., declipping, which attempt to reconstruct clipped segments of the speech signal that are lost through non-invertible dynamic range compression, and (3) matched-training techniques, which attempt to select the pre-trained acoustic model with the closest set of compression parameters. All three of these methods rely on robust estimation of the dynamic range compression distortion parameters. Novel algorithms for the blind prediction of these parameters are also introduced. The algorithms' quality is evaluated in terms of the degree to which they decrease speech recognition word error rate, as well as in terms of the degree to which they increase a given speech signal's signal-to-noise ratio. In all evaluations, the possibility of independent additive noise following the application of dynamic range compression is assumed.
机译:近年来,由于稳步提高的计算能力和技术创新(例如隐马尔可夫模型,加权有限状态换能器和深度学习方法),自动语音识别系统的性能,可靠性和普遍性得到了蓬勃发展。困扰语音识别系统的问题,特别是那些离线操作并已在特定域内数据上训练过的系统,是噪声对语音识别准确性的有害影响。从历史上看,健壮的语音识别研究一直集中在传统噪声类型上,例如加性噪声,线性滤波和混响。本文描述了非线性动态范围压缩对自动语音识别的影响,并提出了许多表征和抵消它的新颖技术。动态范围压缩是任何可减小输入信号动态范围的功能。动态范围压缩是音频工程中广泛使用的工具,几乎始终是实用电信系统的组成部分。尽管它无处不在,但本论文还是首次全面研究和解决动态范围压缩对语音识别的影响的工作。更具体地说,本论文以三种方式处理动态范围压缩的问题:(1)盲幅归一化方法, (2)盲幅重构技术(即去噪),试图抵消由于非可逆动态范围压缩而丢失的语音信号的削波段,从而抵消其参数值允许函数进行数学求逆的动态范围压缩; (3)匹配训练技术,尝试选择具有最接近的压缩参数集的预训练声学模型。所有这三种方法都依赖于动态范围压缩失真参数的可靠估计。还介绍了用于这些参数的盲目预测的新颖算法。根据算法降低语音识别单词错误率的程度以及提高给定语音信号的信噪比的程度来评估算法的质量。在所有评估中,都假定在应用动态范围压缩后可能会产生独立的附加噪声。

著录项

  • 作者

    Harvilla, Mark J.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Engineering Electronics and Electrical.;Engineering Computer.;Computer Science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 150 p.
  • 总页数 150
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号