...
首页> 外文期刊>International journal of speech technology >Designing of Gabor filters for spectro-temporal feature extraction to improve the performance of ASR system
【24h】

Designing of Gabor filters for spectro-temporal feature extraction to improve the performance of ASR system

机译:Gabor滤波器的光谱时域特征提取设计以提高ASR系统的性能

获取原文
获取原文并翻译 | 示例
           

摘要

Existing automatic speech recognition (ASR) system uses the spectral or temporal features of speech. The performance of such systems is still poor compared to the human perception of hearing, especially in noisy environments. This paper concentrates on the extraction of spectro-temporal features based on physiological and psychoacoustically inspired approaches. Here, two dimensional Gabor filters are used to estimate the spectro-temporal features from time-frequency representation of uttered speech signals. The Gabor filters are designed using the concept of constant Q factor. It is found that human perception system maintains approximately constant Q in its frequency response along the chain of its filter bank. Constant Q analysis ensures that the Gabor filters occupy a set of geometrically spaced spectral and temporal bins. Time-frequency representation of speech signal is a key ingredient for Gabor based feature extraction method. For time-frequency mapping, Gammatonegram is adopted instead of conventional spectrogram representations. The performance of the ASR system with the proposed feature set is experimentally validated using AURORA2 noisy digit database. Under clean training; the proposed features obtained a relative improvement of about 50% in word error rate (WER) compared to Mel frequency cep-stral coefficients (MFCC) features. A relative improvement of 23% in WER is also obtained compared with that of existing spectro-temporal feature extraction methods. Further analysis is carried out on TIMET corrupted with noise samples taken from the NOISEX-92 database. The experimental verification proves the robustness of proposed features in building a robust acoustic model for the ASR system.
机译:现有的自动语音识别(ASR)系统使用语音的频谱或时间特征。与人类对听力的感知相比,此类系统的性能仍然很差,尤其是在嘈杂的环境中。本文着重于基于生理和心理听觉启发方法的光谱时间特征的提取。在此,使用二维Gabor滤波器从发出的语音信号的时频表示中估计频谱时间特征。 Gabor滤波器是使用恒定Q因子的概念设计的。发现人类感知系统沿其滤波器组链的频率响应中保持近似恒定的Q。常数Q分析可确保Gabor滤波器占据一组在几何上隔开的频谱和时间区间。语音信号的时频表示是基于Gabor的特征提取方法的关键要素。对于时频映射,采用伽玛音标代替常规的频谱图表示。使用AURORA2噪声数字数据库通过实验验证了具有建议功能集的ASR系统的性能。在干净的训练下;与梅尔频率倒谱系数(MFCC)特征相比,拟议的特征在字错误率(WER)方面获得了约50%的相对改进。与现有的光谱时间特征提取方法相比,WER相对提高了23%。对TIMET进行了进一步分析,其中TIMET被NOISEX-92数据库中的噪声样本破坏了。实验验证证明了在为ASR系统建立鲁棒的声学模型时所提出功能的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号