首页> 外文学位 >Estimators of power spectrum and binary mask for improved speech intelligibility.
【24h】

Estimators of power spectrum and binary mask for improved speech intelligibility.

机译:功率谱和二进制掩码的估计器可改善语音清晰度。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation seeks for different approaches to speech enhancement that could possibly be used for speech intelligibility improvement under adverse conditions. The objective measures that are used to predict the speech intelligibility inspired this research. Instead of predicting intelligibility, the speech features used by these objective measures could also be utilized to estimate the clean speech and may improve the intelligibility of corrupted speech.;The ideal binary mask (IdBM) assumes that the local signal-to-noise ratio (SNR) is known, and if available can be used to restore speech intelligibility. The idea based on the ideal binary masking is partly motivated by the widely used objective measure, the articulation index (AI), which assumes the speech intelligibility depends on the proportion of time the speech signal power exceeds the masker power. Motivated by the ideal binary mask (IdBM), the author proposed various statistical model based method to model the binary masking, and estimate the clean speech by estimating the power spectrum. By assuming a Gaussian model for the speech and noise power spectra and that the noisy observation is equal to the sum of the clean speech and noise power spectra, the proposed maximum a posteriori (MAP) estimator gives the binary masking. The author further proposed the statistical model for the instantaneous SNR, and proposed soft masking incorporating SNR uncertainty. By using these models, the author explored other objective functions to estimate the speech power spectrum and proposed a number of estimators. These estimators were evaluated using speech quality objective measures. All of them were found to significantly improve speech quality.;Inspired by another objective measure for speech intelligibility, i.e., the speech transmission index (STI), the author proposed machine-learning based estimators to estimate the binary masks, based on the amplitude modulation spectrum (AMS). The multi-layer perceptron (MLP) and the Gaussian mixture model (GMM) were used as the basic classifiers. The enhanced speech was tested by normal hearing people, showing improvement of speech intelligibility at -5 and 0 dB global SNR.;Another speech enhancement method that combines the estimators of speech and noise was also proposed. This method approximates the signal-to-residual noise ratio (SNRESI) as the function of the a priori SNR and the gain function. The SNRESI was found as a very important indicator for speech intelligibility improvement. By evaluating the approximated SNRESI, this method optimized the selection between the speech and the noise estimators to improve the output SNR.;To summarize, this dissertation proposed a number of new methods for speech enhancement aiming at speech intelligibility improvement. For normal hearing people, the speech intelligibility improvement is still an extremely difficult problem. For the controlled environment, as shown by the machine learning based method of this dissertation, speech intelligibility can be improved.
机译:本论文寻求可以在不利条件下用于提高语音清晰度的不同方法。用于预测语音清晰度的客观方法启发了这项研究。这些客观测量方法所使用的语音特征也可以用来预测干净语音,而不是预测可懂度,并且可以改善损坏语音的可懂度。;理想二进制掩码(IdBM)假定本地信噪比( (SNR)是已知的,并且如果可用,则可以用于恢复语音清晰度。基于理想二进制掩蔽的想法部分是由广泛使用的客观度量(清晰度指数)推动的,该指数表示语音清晰度取决于语音信号功率超过掩蔽功率的时间比例。受理想二进制掩码(IdBM)的启发,作者提出了各种基于统计模型的方法来对二进制掩码进行建模,并通过估计功率谱来估计纯净语音。通过假设语音和噪声功率谱的高斯模型以及噪声观测值等于干净语音和噪声功率谱的总和,建议的最大后验(MAP)估计器将提供二值掩蔽。作者进一步提出了瞬时SNR的统计模型,并提出了包含SNR不确定性的软掩蔽。通过使用这些模型,作者探索了其他目标函数来估计语音功率谱,并提出了许多估计器。使用语音质量客观度量对这些估计量进行了评估。发现所有这些都可以显着提高语音质量。;受语音清晰度的另一种客观衡量标准即语音传输指数(STI)的启发,作者提出了基于机器学习的估计器,基于幅度调制来估计二进制掩码。频谱(AMS)。多层感知器(MLP)和高斯混合模型(GMM)被用作基本分类器。增强的语音经过正常听众的测试,显示了在-5和0 dB全局SNR时语音清晰度的提高。;还提出了另一种结合语音和噪声估计量的语音增强方法。该方法将信噪比(SNRESI)近似为先验SNR和增益函数的函数。发现SNRESI是提高语音清晰度的非常重要的指标。通过评估近似的SNRESI,该方法优化了语音和噪声估计器之间的选择,以提高输出信噪比。总之,本文提出了许多旨在提高语音清晰度的新方法。对于正常听力的人来说,语音清晰度的提高仍然是一个非常困难的问题。对于受控环境,如本论文基于机器学习的方法所示,可以提高语音清晰度。

著录项

  • 作者

    Lu, Yang.;

  • 作者单位

    The University of Texas at Dallas.;

  • 授予单位 The University of Texas at Dallas.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 131 p.
  • 总页数 131
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 康复医学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号