...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Constrained Learned Feature Extraction for Acoustic Scene Classification
【24h】

Constrained Learned Feature Extraction for Acoustic Scene Classification

机译:约束学习特征提取用于声音场景分类

获取原文
获取原文并翻译 | 示例
           

摘要

Deep neural networks (DNNs) have been proven to be powerful models for acoustic scene classification tasks. State-of-the-art DNNs have millions of connections and are computationally intensive, making them difficult to deploy on systems with limited resources. With a focus on acoustic scene classification, we describe a new learnable module, the simulated Fourier transform module, which allows deep neural networks to implement the discrete Fourier transform operation 8x faster on a graphics processing unit (GPU). We frame the signal processing procedure as an adaptive machine learning problem and introduce learnable parameters in the module to facilitate fast adaptation for the complex and variable acoustic signal. This module gives neural networks the ability to model audio signals from raw waveforms, without extra fast Fourier transform and filter bank patches. Then, we use the temporal transformer module, which has been previously published, to alleviate the information loss caused by the simulated Fourier transform module. These techniques can be integrated into an existing fully connected neural network (FCNN), convolutional neural network (CNN), or recurrent neural network (RNN) models. We evaluate the proposed strategy using four acoustic scene datasets (LITIS Rouen, DCASE2016, DCASE2017, and DCASE2018) as target tasks. We show that the proposed approach significantly outperforms the vanilla FCNN, CNN, and RNN approach on both efficiency and performance. For instance, the proposed approach can reduce inference time by 8x while reducing the classification error on LITIS Rouen dataset from 3.21% to 1.81%.
机译:深度神经网络(DNN)已被证明是声学场景分类任务的强大模型。最新的DNN具有数百万个连接,并且计算量大,这使得它们很难在资源有限的系统上部署。我们将重点放在声学场景分类上,描述了一个新的可学习模块,即模拟傅立叶变换模块,该模块允许深度神经网络在图形处理单元(GPU)上以8倍速实现离散傅立叶变换操作。我们将信号处理过程定义为自适应机器学习问题,并在模块中引入可学习的参数,以促进对复杂和可变声学信号的快速适应。该模块使神经网络能够从原始波形建模音频信号,而无需额外的快速傅立叶变换和滤波器组补丁。然后,我们使用以前发布的时间变换器模块来减轻由模拟傅立叶变换模块引起的信息丢失。这些技术可以集成到现有的全连接神经网络(FCNN),卷积神经网络(CNN)或递归神经网络(RNN)模型中。我们使用四个声学场景数据集(LITIS Rouen,DCASE2016,DCASE2017和DCASE2018)作为目标任务来评估所提出的策略。我们表明,所提出的方法在效率和性能上都明显优于香草FCNN,CNN和RNN方法。例如,提出的方法可以将推理时间减少8倍,同时将LITIS Rouen数据集上的分类误差从3.21%减少到1.81%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号