首页> 外文会议>IEEE International Conference on Semantic Computing >Acoustic Scene Classification Using Spatial Pyramid Pooling with Convolutional Neural Networks
【24h】

Acoustic Scene Classification Using Spatial Pyramid Pooling with Convolutional Neural Networks

机译:使用卷积神经网络的空间金字塔池进行声音场景分类

获取原文

摘要

Automatic understanding of audio events and acoustic scenes has been an active research topic for researchers from signal processing and machine learning communities. Recognition of acoustic scenes in the real life scenarios is a challenging task due to the diversity of environmental sounds and uncontrolled environments. Efficient methods and feature representations are needed to cope with these challenges. In this study, we address the acoustic scene classification of raw audio signal and propose a cascaded CNN architecture that uses spatial pyramid pooling (SPP, also referred to as spatial pyramid matching) method to aggregate local features coming from convolutional layers of the CNN. We use three well known audio features, namely MFCC, Mel Energy, and spectrogram to represent audio content and evaluate the effectiveness of our proposed CNN-SPP architecture on the DCASE 2018 acoustic scene performance dataset. Our results show that, the proposed CNN-SPP architecture with the spectrogram feature improves the classification accuracy.
机译:对于信号处理和机器学习社区的研究人员而言,对音频事件和声学场景的自动理解一直是活跃的研究主题。由于环境声音的多样性和不受控制的环境,在现实生活中识别声学场景是一项艰巨的任务。需要有效的方法和特征表示来应对这些挑战。在这项研究中,我们解决了原始音频信号的声学场景分类问题,并提出了一种级联的CNN架构,该架构使用空间金字塔池化(SPP,也称为空间金字塔匹配)方法来聚合来自CNN卷积层的局部特征。我们使用MFCC,Mel Energy和频谱图这三个众所周知的音频功能来表示音频内容,并在DCASE 2018声学场景性能数据集上评估我们提出的CNN-SPP体系结构的有效性。我们的结果表明,提出的具有频谱图功能的CNN-SPP体系结构提高了分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号