CNN-Based Learnable Gammatone Filterbank and Equal-Loudness Normalization for Environmental Sound Classification

Park Hyunsin; Yoo Chang D.

首页> 外文期刊>IEEE signal processing letters >CNN-Based Learnable Gammatone Filterbank and Equal-Loudness Normalization for Environmental Sound Classification

【24h】

CNN-Based Learnable Gammatone Filterbank and Equal-Loudness Normalization for Environmental Sound Classification

机译：基于CNN的学习伽马酸乳粥站和环境声音分类的相等响度归一化

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

For environmental sound classification (ESC), this letter presents a learnable auditory filterbank based on a one-dimensional (1D) convolutional neural network with strong psychophysiological inductive bias in the form of a gammatone filterbank and an equal-loudness prompting normalization. In the past, a number of ESC methods based on learnable auditory features obtained by performing plain 1D convolutions on raw input waveforms for outperforming traditional handcrafted features such as a mel-frequency filterbank have been proposed. However, the large number of parameters involved in the convolutions suggests that these methods will not generalize better than a model defined by a smaller number of parameters, which is considered in this letter. Here, a learnable gammatone filterbank layer consisting of 1D kernels represented by a parametric form of the bandpass gammatone filters is proposed for acquiring a time-frequency representation of the raw waveform. A normalization with learnable parameters that control the trade-off between energy equalization and structure preservation in the spectro-temporal domain is proposed. To verify the effectiveness of the considered network and the normalization, ESC experiments on the ESC-50 and UrbanSound8K datasets were conducted. Compared to other state-of-the-art networks, the considered network performed better on the two datasets. In addition, an ensemble architecture achieved further performance improvement.

机译：对于环境声分类（ESC），这封信基于一维（1D）卷积神经网络的学习听觉滤波器，具有γ滤波器组形式的强烈的心理生理诱导偏压和促使正常化的相等响度。在过去，提出了许多基于通过对优于传统手工滤波器诸如诸如熔融滤波器的传统手工业特征来执行普通的1D卷曲而获得的学习听觉特征的许多ESC方法。然而，卷积中涉及的大量参数表明这些方法不会比由较少数量的参数定义的模型更好地概括，这在这封信中被考虑。这里，提出了一种由由带通γ滤波器的参数形式表示的1D内核组成的学习伽马河滤波器拦截层，用于获取原始波形的时频表示。提出了具有控制能量均衡与频谱时间域中的能量均衡和结构保存之间的权衡的可学习参数的归一化。为了验证所考虑的网络的有效性和标准化，对ESC-50和URBAnsound8K数据集进行了归一化和标准化，ESC实验。与其他最先进的网络相比，所考虑的网络在两个数据集上更好地执行。此外，集合架构实现了进一步的性能改进。

著录项

来源
《IEEE signal processing letters》 |2020年第2020期|411-415|共5页
作者
Park Hyunsin; Yoo Chang D.;
展开▼
作者单位

Korea Adv Inst Sci & Technol Sch Elect Engn Daejeon 305701 South Korea;

Korea Adv Inst Sci & Technol Sch Elect Engn Daejeon 305701 South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Time-frequency analysis; Convolution; Two dimensional displays; Kernel; Machine learning; Training data; Bandwidth; ESC; CNN; LGTFB; EN;

机译：时频分析;卷积;二维显示器;内核;机器学习;培训数据;带宽;ESC;CNN;LGTFB;ZH;

相似文献

外文文献
中文文献
专利

1. Towards Domain Invariant Heart Sound Abnormality Detection Using Learnable Filterbanks [J] . Humayun Ahmed Imtiaz, Ghaffarzadegan Shabnam, Ansari Md. Istiaq, Biomedical and Health Informatics, IEEE Journal of . 2020,第8期

机译：朝着域不变的心脏声音异常检测，使用可学习的FilterBanks
2. Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback-Leibler divergence [J] . Nasir Saleem, Gohar Ijaz International journal of speech technology . 2018,第2期

机译：基于低秩稀疏分解模型的基于伽马通滤波器组和Kullback-Leibler发散的语音增强
3. Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts [J] . Fedila M., Bengherabi M., Amrouche A. Multimedia Tools and Applications . 2018,第13期

机译：Gammatone滤波器组和基于幅度和相位的频谱的共生组合，可在嘈杂条件和压缩伪像下进行可靠的说话人验证
4. Novel TEO-based Gammatone features for environmental sound classification [C] . Dharmesh M. Agrawal, Hardik B. Sailor, Meet H. Soni, European Signal Processing Conference . 2017

机译：基于TEO的新颖Gammatone功能可进行环境声音分类
5. Fingerprint classification and matching using a filterbank. [D] . Prabhakar, Salil. 2001

机译：使用滤波器组进行指纹分类和匹配。
6. Investigating the use of a Gammatone filterbank for a cochlear implant coding strategy [O] . Sonia Tabibi, Andrea Kegel, Wai Kong Lai, -1

机译：研究使用Gammatone滤波器组进行人工耳蜗编码策略
7. Novel TEO-based Gammatone features for environmental sound classification [O] . Dharmesh M. Agrawal, Hardik B. Sailor, Meet H. Soni, 2017

机译：基于新型TEO的伽酸伽酸，适用于环境声音分类

CNN-Based Learnable Gammatone Filterbank and Equal-Loudness Normalization for Environmental Sound Classification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅