首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Sparse Approximations for Drum Sound Classification
【24h】

Sparse Approximations for Drum Sound Classification

机译:鼓声分类的稀疏近似

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Up to now, there has only been little work on using features from temporal approximations of signals for audio recognition. Time–frequency tradeoffs are an important issue in signal processing; sparse representations using overcomplete dictionaries may (or may not, depending on the dictionary) have more time–frequency flexibility than standard short-time Fourier transform. Also, the precise temporal structure of signals cannot be captured by spectral-based feature methods. Here, we present a biologically inspired three-step process for audio classification: 1) Efficient atomic functions are learned in an unsupervised manner on mixtures of percussion sounds (drum phrases), optimizing the length as well as the shape of the atoms. 2) An analog spike model is used to sparsely approximate percussion sound signals (bass drum, snare drum, hi-hat). The spike model consists of temporally shifted versions of the learned atomic functions, each having a precise temporal position and amplitude. To obtain the decomposition given a set of atomic functions, matching pursuit is used. 3) Features are extracted from the resulting spike representation of the signal. The classification accuracy of our method using a support vector machine (SVM) in a 3-class database transfer task is 87.8%. Using gammatone functions instead of the learned sparse functions yields an even better classification rate of 97.6%. Testing the features on sounds containing additive white Gaussian noise reveals that sparse approximation features are far more robust to such distortions than our benchmark feature set of timbre descriptor (TD) features.
机译:到目前为止,在将信号的时间近似中的特征用于音频识别方面,只有很少的工作。时频权衡是信号处理中的重要问题。与标准的短时傅立叶变换相比,使用不完整词典的稀疏表示可能(或可能不取决于字典)具有更大的时频灵活性。同样,信号的精确时间结构无法通过基于频谱的特征方法捕获。在这里,我们提出了一种由生物学启发的,用于音频分类的三步过程:1)以无监督的方式在敲击声音(鼓乐句)的混合物上学习有效的原子功能,从而优化了原子的长度和形状。 2)使用模拟尖峰模型来稀疏地近似打击乐器的声音信号(低音鼓,军鼓,踩hat)。尖峰模型由学习到的原子函数的时间偏移版本组成,每个版本都有精确的时间位置和幅度。为了获得给定的一组原子函数的分解,使用匹配追踪。 3)从所得的信号尖峰表示中提取特征。我们的方法在3类数据库传输任务中使用支持向量机(SVM)的分类精度为87.8%。使用伽马通函数代替学习的稀疏函数会产生更好的97.6%的分类率。对包含加性高斯白噪声的声音的特征进行测试后发现,与我们的基准音色描述符(TD)特征集相比,稀疏近似特征对这种失真的鲁棒性更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号