首页> 外文学位 >Reconstruction of incomplete spectrograms for robust speech recognition.
【24h】

Reconstruction of incomplete spectrograms for robust speech recognition.

机译:重构不完整的频谱图以增强语音识别能力。

获取原文
获取原文并翻译 | 示例

摘要

The performance of automatic speech recognition (ASR) systems degrades greatly when speech is corrupted by noise. Missing feature methods attempt to reduce this degradation by deleting components of a time-frequency representation of speech (such as a spectrogram) that exhibit low signal-to-noise ratio (SNR). Recognition is then performed using only the remaining components of the incomplete spectrogram. These methods have been shown to result in recognition accuracies that are very robust to the effects of additive noise. However, conventional missing feature methods, which modify the classifier used to perform the recognition, suffer from the drawback that they are constrained to use the log-spectral vectors of the spectrogram as features for recognition. It is well known recognition systems that use log-spectral features perform poorly compared to systems that use cepstral features.; In this thesis we propose two new approaches that recast the missing feature paradigm as a data compensation problem, by reconstructing missing elements to obtain complete spectrograms. In the first approach, referred to as cluster-based reconstruction, incoming log-spectral vectors from clean speech are clustered. Missing spectrographic features from noisy data are recovered by first identifying the closest cluster based on the values of the features that are present, and then estimating the missing values using MAP procedures. The second approach, referred to as covariance-based reconstruction, uses MAP procedures to estimate the value of the missing components of the spectrogram based on their correlations with the elements that are present. Both methods take into account the bounds on the clean spectrogram imposed by additive noise. In either case, cepstral features are computed from the reconstructed spectrograms and used for recognition without any modification of the speech recognition system.; When corrupt regions of the spectrogram are known a priori, recognition accuracies resulting from reconstruction methods are seen to be much higher than those obtained with the best current missing feature methods based on modification of the recognition system. The proposed spectrogram reconstruction methods are also computationally less expensive than the best conventional missing feature methods.; We also propose two methods that attempt to identify corrupt regions of the spectrographic representations of incoming speech. The first method utilizes noise spectrum estimates of vector Taylor series (VTS) compensation for noise-corrupted speech, while the second method treats the identification task as a classic Bayesian classification problem. Combination of the best method to identify corrupt regions with the best method to reconstruct them produces recognition accuracies better than any other known algorithm for speech in additive white noise. We also observe significant improvement in recognition accuracy for speech in the presence of background music if the locations of corrupted spectrographic regions are known a priori , but we have been less successful in blind identification of these corrupt regions for these signals.
机译:当语音被噪声破坏时,自动语音识别(ASR)系统的性能将大大降低。缺少特征的方法试图通过删除表现出低信噪比(SNR)的语音时频表示(例如声谱图)的成分来减少这种劣化。然后仅使用不完整频谱图的其余组件执行识别。这些方法已显示出对加性噪声的影响非常可靠的识别精度。但是,传统的丢失特征方法会修改用于执行识别的分类器,但存在以下缺点:它们被约束为将频谱图的对数谱向量用作识别特征。众所周知,与使用倒谱特征的系统相比,使用对数谱特征的识别系统的性能较差。在本文中,我们提出了两种新的方法,通过重建缺失元素以获得完整的频谱图,将缺失特征范式重塑为数据补偿问题。在第一种方法中,称为基于聚类的重建,对来自干净语音的传入对数谱向量进行聚类。通过首先基于存在的特征的值识别最近的聚类,然后使用MAP程序估计缺失的值,可以恢复嘈杂数据中丢失的光谱特征。第二种方法称为基于协方差的重建,它使用MAP程序根据频谱图的缺失成分与存在的元素之间的相关性来估计它们的值。两种方法都考虑了加性噪声对干净频谱图的限制。在任何一种情况下,倒谱特征都是从重建的频谱图计算出来的,并用于识别,而无需对语音识别系统进行任何修改。当先验地知道频谱图的损坏区域时,可以看出,由重建方法产生的识别准确度要比基于识别系统修改的最佳当前缺失特征方法所获得的准确度高得多。所提出的频谱图重建方法在计算上也比最佳的传统缺失特征方法便宜。我们还提出了两种方法来尝试识别传入语音的光谱表示的损坏区域。第一种方法利用矢量泰勒级数(VTS)补偿的噪声频谱估计来处理噪声受损的语音,而第二种方法则将识别任务视为经典的贝叶斯分类问题。识别损坏区域的最佳方法与重建缺陷的最佳方法相结合,可产生比其他任何已知的加性白噪声语音算法更好的识别精度。如果损坏的光谱区域的位置已知为先验,我们还观察到在存在背景音乐的情况下,语音的识别准确度有了显着提高,但是对于这些情况,我们无法成功地盲目识别这些损坏的区域信号。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号