...
首页> 外文期刊>IEEE transactions on audio, speech and language processing >A Noise-Robust FFT-Based Auditory Spectrum With Application in Audio Classification
【24h】

A Noise-Robust FFT-Based Auditory Spectrum With Application in Audio Classification

机译:基于噪声稳健FFT的听觉谱及其在音频分类中的应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this paper, we investigate the noise robustness of Wang and Shamma''s early auditory (EA) model for the calculation of an auditory spectrum in audio classification applications. First, a stochastic analysis is conducted wherein an approximate expression of the auditory spectrum is derived to justify the noise-suppression property of the EA model. Second, we present an efficient fast Fourier transform (FFT)-based implementation for the calculation of a noise-robust auditory spectrum, which allows flexibility in the extraction of audio features. To evaluate the performance of the proposed FFT-based auditory spectrum, a set of speech/music/noise classification tasks is carried out wherein a support vector machine (SVM) algorithm and a decision tree learning algorithm (C4.5) are used as the classifiers. Features used for classification include conventional Mel-frequency cepstral coefficients (MFCCs), MFCC-like features obtained from the original auditory spectrum (i.e., based on the EA model) and the proposed FFT-based auditory spectrum, as well as spectral features (spectral centroid, bandwidth, etc.) computed from the latter. Compared to the conventional MFCC features, both the MFCC-like and spectral features derived from the proposed FFT-based auditory spectrum show more robust performance in noisy test cases. Test results also indicate that, using the new MFCC-like features, the performance of the proposed FFT-based auditory spectrum is slightly better than that of the original auditory spectrum, while its computational complexity is reduced by an order of magnitude.
机译:在本文中,我们研究了Wang和Shamma的早期听觉(EA)模型的噪声鲁棒性,用于计算音频分类应用中的听觉频谱。首先,进行随机分析,其中导出听觉频谱的近似表达式以证明EA模型的噪声抑制特性是正确的。其次,我们提出了一种基于高效快速傅立叶变换(FFT)的实现,用于计算噪声健壮的听觉频谱,从而可以灵活地提取音频特征。为了评估建议的基于FFT的听觉频谱的性能,执行了一组语音/音乐/噪声分类任务,其中支持向量机(SVM)算法和决策树学习算法(C4.5)被用作分类器。用于分类的特征包括常规的梅尔频率倒谱系数(MFCC),从原始听觉频谱(即,基于EA模型)和拟议的基于FFT的听觉频谱以及频谱特征(频谱质心,带宽等)。与常规MFCC功能相比,从拟议的基于FFT的听觉频谱中获得的类MFCC和频谱特征在嘈杂的测试案例中均显示出更强大的性能。测试结果还表明,使用类似于MFCC的新功能,建议的基于FFT的听觉频谱的性能比原始听觉频谱的性能稍好,同时其计算复杂度降低了一个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号