首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes
【24h】

Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes

机译:基于注意力的Atrous卷积神经网络:声学场景的可视化和理解视角

获取原文

摘要

The goal of Acoustic Scene Classification (ASC) is to recognise the environment in which an audio waveform has been recorded. Recently, deep neural networks have been applied to ASC and have achieved state-of-the-art performance. However, few works have investigated how to visualise and understand what a neural network has learnt from acoustic scenes. Previous work applied local pooling after each convolutional layer, therefore reduced the size of the feature maps. In this paper, we suggest that local pooling is not necessary, but the size of the receptive field is important. We apply atrous Convolutional Neural Networks (CNNs) with global attention pooling as the classification model. The internal feature maps of the attention model can be visualised and explained. On the Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 dataset, our proposed method achieves an accuracy of 72.7 %, significantly outperforming the CNNs without dilation at 60.4 %. Furthermore, our results demonstrate that the learnt feature maps contain rich information on acoustic scenes in the time-frequency domain.
机译:声音场景分类(ASC)的目标是识别已记录音频波形的环境。最近,深度神经网络已应用于ASC,并已实现了最新的性能。但是,很少有作品研究如何可视化和理解神经网络从声学场景中学到的东西。先前的工作在每个卷积层之后都应用了局部池化,因此减小了特征图的大小。在本文中,我们建议本地池化不是必需的,但是接收域的大小很重要。我们应用无环卷积神经网络(CNN),并将其作为全球注意力集中的分类模型。注意模型的内部特征图可以被可视化和解释。在声音场景和事件的检测和分类(DCASE)2018数据集上,我们提出的方法达到了72.7%的准确度,显着优于CNN且无膨胀的60.4%。此外,我们的结果表明,学习到的特征图在时频域中包含丰富的声学场景信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号