Constrained Learned Feature Extraction for Acoustic Scene Classification

Zhang Teng; Wu Ji

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Constrained Learned Feature Extraction for Acoustic Scene Classification

【24h】

Constrained Learned Feature Extraction for Acoustic Scene Classification

机译：约束学习特征提取用于声音场景分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural networks (DNNs) have been proven to be powerful models for acoustic scene classification tasks. State-of-the-art DNNs have millions of connections and are computationally intensive, making them difficult to deploy on systems with limited resources. With a focus on acoustic scene classification, we describe a new learnable module, the simulated Fourier transform module, which allows deep neural networks to implement the discrete Fourier transform operation 8x faster on a graphics processing unit (GPU). We frame the signal processing procedure as an adaptive machine learning problem and introduce learnable parameters in the module to facilitate fast adaptation for the complex and variable acoustic signal. This module gives neural networks the ability to model audio signals from raw waveforms, without extra fast Fourier transform and filter bank patches. Then, we use the temporal transformer module, which has been previously published, to alleviate the information loss caused by the simulated Fourier transform module. These techniques can be integrated into an existing fully connected neural network (FCNN), convolutional neural network (CNN), or recurrent neural network (RNN) models. We evaluate the proposed strategy using four acoustic scene datasets (LITIS Rouen, DCASE2016, DCASE2017, and DCASE2018) as target tasks. We show that the proposed approach significantly outperforms the vanilla FCNN, CNN, and RNN approach on both efficiency and performance. For instance, the proposed approach can reduce inference time by 8x while reducing the classification error on LITIS Rouen dataset from 3.21% to 1.81%.

机译：深度神经网络（DNN）已被证明是声学场景分类任务的强大模型。最新的DNN具有数百万个连接，并且计算量大，这使得它们很难在资源有限的系统上部署。我们将重点放在声学场景分类上，描述了一个新的可学习模块，即模拟傅立叶变换模块，该模块允许深度神经网络在图形处理单元（GPU）上以8倍速实现离散傅立叶变换操作。我们将信号处理过程定义为自适应机器学习问题，并在模块中引入可学习的参数，以促进对复杂和可变声学信号的快速适应。该模块使神经网络能够从原始波形建模音频信号，而无需额外的快速傅立叶变换和滤波器组补丁。然后，我们使用以前发布的时间变换器模块来减轻由模拟傅立叶变换模块引起的信息丢失。这些技术可以集成到现有的全连接神经网络（FCNN），卷积神经网络（CNN）或递归神经网络（RNN）模型中。我们使用四个声学场景数据集（LITIS Rouen，DCASE2016，DCASE2017和DCASE2018）作为目标任务来评估所提出的策略。我们表明，所提出的方法在效率和性能上都明显优于香草FCNN，CNN和RNN方法。例如，提出的方法可以将推理时间减少8倍，同时将LITIS Rouen数据集上的分类误差从3.21％减少到1.81％。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2019年第8期|1216-1228|共13页
作者
Zhang Teng; Wu Ji;
展开▼
作者单位

Tsinghua Univ Dept Elect Engn Beijing 100084 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep neural networks; Fourier transform; acoustic scene classification;

机译：深度神经网络;傅里叶变换;声场分类;

相似文献

外文文献
中文文献
专利

1. A Novel Discriminative Feature Extraction for Acoustic Scene Classification Using RNN Based Source Separation [J] . Seongkyu MUN, Suwon SHON, Wooil KIM, IEICE transactions on information and systems . 2017,第12期

机译：基于RNN的信源分离的声音场景分类新判别特征提取。
2. Investigation of acoustic and visual features for acoustic scene classification [J] . Xie Jie, Zhu Mingying Expert Systems with Application . 2019,第JULa期

机译：声学和视觉特征的声学场景分类研究
3. Transferring Pre-Trained Deep CNNs for Remote Scene Classification with General Features Learned from Linear PCA Network [J] . Jie Wang, Chang Luo, Hanqiao Huang, Remote Sensing . 2017,第3期

机译：具有从线性PCA网络中学习到的一般功能的预训练深层CNN的传输，以进行远程场景分类
4. Feature Extraction of Surround Sound Recordings for Acoustic Scene Classification [C] . Slawomir K. Zielinski International conference on artificial intelligence and soft computing . 2018

机译：用于声音场景分类的环绕声录音特征提取
5. Mixed-signal distributed feature extraction for classification of wide-band acoustic signals on sensor networks. [D] . Santacruz, Humberto. 2011

机译：混合信号分布式特征提取，用于传感器网络上宽带声信号的分类。
6. Deep Learning for Feature Extraction in Remote Sensing: A Case-Study of Aerial Scene Classification [O] . Biserka Petrovska, Eftim Zdravevski, Petre Lameski, 2020

机译：遥感中特征提取的深度学习 - 以空域分类为例
7. Feature Extraction of Binaural Recordings for Acoustic Scene Classification [O] . Sławomir Zieliński, Hyunkook Lee 2018

机译：声学场景分类双耳录音特征提取

Constrained Learned Feature Extraction for Acoustic Scene Classification

摘要

著录项

相似文献

相关主题

期刊订阅