Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes

机译：基于注意力的Atrous卷积神经网络：声学场景的可视化和理解视角

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The goal of Acoustic Scene Classification (ASC) is to recognise the environment in which an audio waveform has been recorded. Recently, deep neural networks have been applied to ASC and have achieved state-of-the-art performance. However, few works have investigated how to visualise and understand what a neural network has learnt from acoustic scenes. Previous work applied local pooling after each convolutional layer, therefore reduced the size of the feature maps. In this paper, we suggest that local pooling is not necessary, but the size of the receptive field is important. We apply atrous Convolutional Neural Networks (CNNs) with global attention pooling as the classification model. The internal feature maps of the attention model can be visualised and explained. On the Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 dataset, our proposed method achieves an accuracy of 72.7 %, significantly outperforming the CNNs without dilation at 60.4 %. Furthermore, our results demonstrate that the learnt feature maps contain rich information on acoustic scenes in the time-frequency domain.

机译：声音场景分类（ASC）的目标是识别已记录音频波形的环境。最近，深度神经网络已应用于ASC，并已实现了最新的性能。但是，很少有作品研究如何可视化和理解神经网络从声学场景中学到的东西。先前的工作在每个卷积层之后都应用了局部池化，因此减小了特征图的大小。在本文中，我们建议本地池化不是必需的，但是接收域的大小很重要。我们应用无环卷积神经网络（CNN），并将其作为全球注意力集中的分类模型。注意模型的内部特征图可以被可视化和解释。在声音场景和事件的检测和分类（DCASE）2018数据集上，我们提出的方法达到了72.7％的准确度，显着优于CNN且无膨胀的60.4％。此外，我们的结果表明，学习到的特征图在时频域中包含丰富的声学场景信息。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|56-60|共5页
会议地点
作者
Zhao Ren; Qiuqiang Kong; Jing Han; Mark D. Plumbley; Björn W. Schuller;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
convolutional neural nets; feature extraction; image classification; learning (artificial intelligence);

机译：卷积神经网络;特征提取;图像分类;学习（人工智能）;

相似文献

外文文献
中文文献
专利

1. A novel acoustic scene classification model using the late fusion of convolutional neural networks and different ensemble classifiers [J] . Alamir Mahmoud A. Applied Acoustics . 2021,第Apra期

机译：一种新的声学场景分类模型，使用卷积神经网络的后期融合和不同的集合分类
2. Convolutional Nonlinear Differential Recurrent Neural Networks for Crowd Scene Understanding [J] . Naifan Zhuang, The Duc Kieu, Jun Ye, International journal of semantic computing . 2018,第4期

机译：人群场景理解的卷积非线性差分经常性神经网络
3. SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network [J] . Khan Noman, Ullah Amin, Ul Haq Ijaz, Journal of Real-Time Image Processing . 2021,第5期

机译：SD-net：通过高效扩张的卷积神经网络实时了解过度拥挤的场景
4. Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes [C] . Zhao Ren, Qiuqiang Kong, Jing Han, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：基于关注的亚克斯卷积神经网络：声学场景的可视化和理解视角
5. Predicting Images using Convolutional Networks: Visual Scene Understanding with Pixel Maps [D] . Eigen, David 2015

机译：使用卷积网络预测图像：使用像素图了解视觉场景
6. Prediction of CRISPR/Cas9 single guide RNA cleavage efficiency and specificity by attention-based convolutional neural networks [O] . Guishan Zhang, Tian Zeng, Zhiming Dai, 2021

机译：基于关注的卷积神经网络预测CRISPR / CAS9单引导RNA切割效率和特异性
7. The Receptive Field as a Regularizer in Deep Convolutional Neural Networks for Acoustic Scene Classification [O] . Khaled Koutini, Hamid Eghbal-zadeh, Matthias Dorfer, 2019

机译：接收领域是深度卷积神经网络中的声学场景分类的常规器

Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes

摘要

著录项

相似文献

相关主题

期刊订阅