首页> 外文期刊>IEEE signal processing letters >CNN-Based Learnable Gammatone Filterbank and Equal-Loudness Normalization for Environmental Sound Classification
【24h】

CNN-Based Learnable Gammatone Filterbank and Equal-Loudness Normalization for Environmental Sound Classification

机译:基于CNN的学习伽马酸乳粥站和环境声音分类的相等响度归一化

获取原文
获取原文并翻译 | 示例
           

摘要

For environmental sound classification (ESC), this letter presents a learnable auditory filterbank based on a one-dimensional (1D) convolutional neural network with strong psychophysiological inductive bias in the form of a gammatone filterbank and an equal-loudness prompting normalization. In the past, a number of ESC methods based on learnable auditory features obtained by performing plain 1D convolutions on raw input waveforms for outperforming traditional handcrafted features such as a mel-frequency filterbank have been proposed. However, the large number of parameters involved in the convolutions suggests that these methods will not generalize better than a model defined by a smaller number of parameters, which is considered in this letter. Here, a learnable gammatone filterbank layer consisting of 1D kernels represented by a parametric form of the bandpass gammatone filters is proposed for acquiring a time-frequency representation of the raw waveform. A normalization with learnable parameters that control the trade-off between energy equalization and structure preservation in the spectro-temporal domain is proposed. To verify the effectiveness of the considered network and the normalization, ESC experiments on the ESC-50 and UrbanSound8K datasets were conducted. Compared to other state-of-the-art networks, the considered network performed better on the two datasets. In addition, an ensemble architecture achieved further performance improvement.
机译:对于环境声分类(ESC),这封信基于一维(1D)卷积神经网络的学习听觉滤波器,具有γ滤波器组形式的强烈的心理生理诱导偏压和促使正常化的相等响度。在过去,提出了许多基于通过对优于传统手工滤波器诸如诸如熔融滤波器的传统手工业特征来执行普通的1D卷曲而获得的学习听觉特征的许多ESC方法。然而,卷积中涉及的大量参数表明这些方法不会比由较少数量的参数定义的模型更好地概括,这在这封信中被考虑。这里,提出了一种由由带通γ滤波器的参数形式表示的1D内核组成的学习伽马河滤波器拦截层,用于获取原始波形的时频表示。提出了具有控制能量均衡与频谱时间域中的能量均衡和结构保存之间的权衡的可学习参数的归一化。为了验证所考虑的网络的有效性和标准化,对ESC-50和URBAnsound8K数据集进行了归一化和标准化,ESC实验。与其他最先进的网络相比,所考虑的网络在两个数据集上更好地执行。此外,集合架构实现了进一步的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号