Cepstral representation of speech motivated by time–frequency masking: An application to speech recognition

Kiyoaki Aikawa; Harald Singer; Hideki Kawahara; Yohichi Tohkura

首页> 外文期刊>The Journal of the Acoustical Society of America >Cepstral representation of speech motivated by time–frequency masking: An application to speech recognition

【24h】

Cepstral representation of speech motivated by time–frequency masking: An application to speech recognition

机译：时频掩蔽激发语音的倒谱表示：在语音识别中的应用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A new spectral representation incorporating time–frequency forward masking is proposed. This masked spectral representation is efficiently represented by a quefrency domain parameter called dynamic-cepstrum (DyC). Automatic speech recognition experiments have demonstrated that DyC powerfully improves performance in phoneme classification and phrase recognition. This new spectral representation simulates a perceived spectrum. It enhances formant transition, which provides relevant cues for phoneme perception, while suppressing temporally stationary spectral properties, such as the effect of microphone frequency characteristics or the speaker-dependent time-invariant spectral feature. These features are advantageous for speaker-independent speech recognition. DyC can efficiently represent both the instantaneous and transitional aspects of a running spectrum with a vector of the same size as a conventional cepstrum. DyC is calculated from a cepstrum time sequence using a matrix lifter. Each column vector of the matrix lifter performs spectral smoothing. Smoothing characteristics are a function of the time interval between a masker and a signal. DyC outperformed a conventional cepstrum parameter obtained through linear predictive coding (LPC) analysis for both phoneme classification and phrase recognition by using hidden Markov models (HMMs). Compared with speaker-dependent recognition, an even greater improvement over the cepstrum parameter was found in speaker-independent speech recognition. Furthermore, DyC with only 16 coefficients exhibited higher speech recognition performance than a combination of the cepstrum and a delta-cepstrum with 32 coefficients for the classification experiment of phonemes contaminated by noises.

机译：提出了结合时频前向掩蔽的新频谱表示。这种被屏蔽的频谱表示由称为动态倒谱（DyC）的频率域参数有效表示。自动语音识别实验表明，DyC可以大大提高音素分类和短语识别的性能。这种新的光谱表示模拟了感知光谱。它增强了共振峰过渡，从而为音素感知提供了相关线索，同时抑制了时间上固定的频谱特性，例如麦克风频率特性或与说话者相关的时不变频谱特性的影响。这些特征对于独立于说话者的语音识别是有利的。 DyC可以有效地表示运行频谱的瞬时和过渡方面，其向量的大小与常规倒谱的大小相同。使用矩阵提升器根据倒谱时间序列计算DyC。矩阵提升器的每个列向量都执行频谱平滑。平滑特性是掩蔽器和信号之间时间间隔的函数。 DyC优于传统的倒谱参数，该参数是通过使用隐马尔可夫模型（HMM）进行音素分类和短语识别的线性预测编码（LPC）分析获得的。与说话者无关的识别相比，在说话者无关的语音识别中，倒谱参数的改善更大。此外，对于被噪声污染的音素进行分类实验，仅具有16个系数的DyC表现出比倒谱和具有32个系数的δ倒谱的组合更高的语音识别性能。

著录项

来源
《The Journal of the Acoustical Society of America》 |1996年第1期|共12页
作者
Kiyoaki Aikawa; Harald Singer; Hideki Kawahara; Yohichi Tohkura;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类声学;
关键词

相似文献

外文文献
中文文献
专利

1. Cepstral representation of speech motivated by time–frequency masking: An application to speech recognition [J] . Kiyoaki Aikawa, Harald Singer, Hideki Kawahara, The Journal of the Acoustical Society of America . 1996,第1期

机译：时频掩蔽激发语音的倒谱表示：在语音识别中的应用
2. Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures [J] . Darch J, Milner B, Vaseghi S The Journal of the Acoustical Society of America . 2008,第6期

机译：分布式语音识别架构中基于mel-频率倒谱系数的声学语音特征分析和预测
3. MEL FREQUENCY CEPSTRAL COEFFICIENTS (MFCC) FEATURE EXTRACTION ENHANCEMENT IN THE APPLICATION OF SPEECH RECOGNITION: A COMPARISON STUDY [J] . SAYF A. MAJEED, HAFIZAH HUSAIN, SALINA ABDUL SAMAD, Journal of Theoretical and Applied Information Technology . 2015,第1期

机译：MEL频率倒谱系数（MFCC）特征提取在语音识别中的应用：对比研究
4. Time-frequency representation based cepstral processing for speech recognition [C] . Fineberg, A.B., Yu, . 1996

机译：基于时频表示的倒谱处理用于语音识别
5. Investigating the blind separation of speech mixtures using a reassigned time-frequency representation [D] . Perrotta, Salvatore P. 2009

机译：使用重新分配的时频表示调查语音混合的盲分离
6. Recognition of speech in noise after application of time-frequency masks: Dependence on frequency and threshold parameters [O] . Donal G. Sinex -1

机译：应用时频模板后噪声中的语音识别：取决于频率和阈值参数
7. Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures [O] . Darch, Jonathan, Milner, Ben, Vaseghi, Saeed 2008

机译：分布式语音识别架构中基于mel频率倒谱系数的语音特征分析和预测

Cepstral representation of speech motivated by time–frequency masking: An application to speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅