首页> 外文期刊>The Journal of the Acoustical Society of America >Cepstral representation of speech motivated by time–frequency masking: An application to speech recognition
【24h】

Cepstral representation of speech motivated by time–frequency masking: An application to speech recognition

机译:时频掩蔽激发语音的倒谱表示:在语音识别中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

A new spectral representation incorporating time–frequency forward masking is proposed. This masked spectral representation is efficiently represented by a quefrency domain parameter called dynamic-cepstrum (DyC). Automatic speech recognition experiments have demonstrated that DyC powerfully improves performance in phoneme classification and phrase recognition. This new spectral representation simulates a perceived spectrum. It enhances formant transition, which provides relevant cues for phoneme perception, while suppressing temporally stationary spectral properties, such as the effect of microphone frequency characteristics or the speaker-dependent time-invariant spectral feature. These features are advantageous for speaker-independent speech recognition. DyC can efficiently represent both the instantaneous and transitional aspects of a running spectrum with a vector of the same size as a conventional cepstrum. DyC is calculated from a cepstrum time sequence using a matrix lifter. Each column vector of the matrix lifter performs spectral smoothing. Smoothing characteristics are a function of the time interval between a masker and a signal. DyC outperformed a conventional cepstrum parameter obtained through linear predictive coding (LPC) analysis for both phoneme classification and phrase recognition by using hidden Markov models (HMMs). Compared with speaker-dependent recognition, an even greater improvement over the cepstrum parameter was found in speaker-independent speech recognition. Furthermore, DyC with only 16 coefficients exhibited higher speech recognition performance than a combination of the cepstrum and a delta-cepstrum with 32 coefficients for the classification experiment of phonemes contaminated by noises.
机译:提出了结合时频前向掩蔽的新频谱表示。这种被屏蔽的频谱表示由称为动态倒谱(DyC)的频率域参数有效表示。自动语音识别实验表明,DyC可以大大提高音素分类和短语识别的性能。这种新的光谱表示模拟了感知光谱。它增强了共振峰过渡,从而为音素感知提供了相关线索,同时抑制了时间上固定的频谱特性,例如麦克风频率特性或与说话者相关的时不变频谱特性的影响。这些特征对于独立于说话者的语音识别是有利的。 DyC可以有效地表示运行频谱的瞬时和过渡方面,其向量的大小与常规倒谱的大小相同。使用矩阵提升器根据倒谱时间序列计算DyC。矩阵提升器的每个列向量都执行频谱平滑。平滑特性是掩蔽器和信号之间时间间隔的函数。 DyC优于传统的倒谱参数,该参数是通过使用隐马尔可夫模型(HMM)进行音素分类和短语识别的线性预测编码(LPC)分析获得的。与说话者无关的识别相比,在说话者无关的语音识别中,倒谱参数的改善更大。此外,对于被噪声污染的音素进行分类实验,仅具有16个系数的DyC表现出比倒谱和具有32个系数的δ倒谱的组合更高的语音识别性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号