首页> 外文期刊>IEEE transactions on audio, speech and language processing >Significance of the Modified Group Delay Feature in Speech Recognition
【24h】

Significance of the Modified Group Delay Feature in Speech Recognition

机译:改进的群时延特征在语音识别中的意义

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Spectral representation of speech is complete when both the Fourier transform magnitude and phase spectra are specified. In conventional speech recognition systems, features are generally derived from the short-time magnitude spectrum. Although the importance of Fourier transform phase in speech perception has been realized, few attempts have been made to extract features from it. This is primarily because the resonances of the speech signal which manifest as transitions in the phase spectrum are completely masked by the wrapping of the phase spectrum. Hence, an alternative to processing the Fourier transform phase, for extracting speech features, is to process the group delay function which can be directly computed from the speech signal. The group delay function has been used in earlier efforts, to extract pitch and formant information from the speech signal. In all these efforts, no attempt was made to extract features from the speech signal and use them for speech recognition applications. This is primarily because the group delay function fails to capture the short-time spectral structure of speech owing to zeros that are close to the unit circle in the z-plane and also due to pitch periodicity effects. In this paper, the group delay function is modified to overcome these effects. Cepstral features are extracted from the modified group delay function and are called the modified group delay feature (MODGDF). The MODGDF is used for three speech recognition tasks namely, speaker, language, and continuous-speech recognition. Based on the results of feature and performance evaluation, the significance of the MODGDF as a new feature for speech recognition is discussed
机译:当同时指定傅立叶变换幅度和相位频谱时,语音的频谱表示就完成了。在传统的语音识别系统中,特征通常是从短时幅度谱中得出的。尽管已经认识到傅立叶变换阶段在语音感知中的重要性,但很少有人尝试从中提取特征。这主要是因为语音信号的共振在相位频谱中表现为相变,而相位频谱的包裹完全掩盖了该语音信号的共振。因此,处理傅立叶变换阶段以提取语音特征的另一种选择是处理可以从语音信号直接计算的群延迟函数。群延迟功能已经用于较早的工作中,以从语音信号中提取音调和共振峰信息。在所有这些努力中,没有尝试从语音信号中提取特征并将其用于语音识别应用。这主要是因为群延迟函数由于接近z平面中单位圆的零以及音高周期性效应而无法捕获语音的短时频谱结构。在本文中,修改了群延迟函数以克服这些影响。倒谱特征是从修改的群延迟特征中提取的,被称为修改的群延迟特征(MODGDF)。 MODGDF用于三种语音识别任务,即说话者,语言和连续语音识别。根据特征和性能评估的结果,讨论了MODGDF作为语音识别新特征的意义

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号