Acoustic Scene Classification Using Spatial Pyramid Pooling with Convolutional Neural Networks

机译：使用卷积神经网络的空间金字塔池进行声音场景分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic understanding of audio events and acoustic scenes has been an active research topic for researchers from signal processing and machine learning communities. Recognition of acoustic scenes in the real life scenarios is a challenging task due to the diversity of environmental sounds and uncontrolled environments. Efficient methods and feature representations are needed to cope with these challenges. In this study, we address the acoustic scene classification of raw audio signal and propose a cascaded CNN architecture that uses spatial pyramid pooling (SPP, also referred to as spatial pyramid matching) method to aggregate local features coming from convolutional layers of the CNN. We use three well known audio features, namely MFCC, Mel Energy, and spectrogram to represent audio content and evaluate the effectiveness of our proposed CNN-SPP architecture on the DCASE 2018 acoustic scene performance dataset. Our results show that, the proposed CNN-SPP architecture with the spectrogram feature improves the classification accuracy.

机译：对于信号处理和机器学习社区的研究人员而言，对音频事件和声学场景的自动理解一直是活跃的研究主题。由于环境声音的多样性和不受控制的环境，在现实生活中识别声学场景是一项艰巨的任务。需要有效的方法和特征表示来应对这些挑战。在这项研究中，我们解决了原始音频信号的声学场景分类问题，并提出了一种级联的CNN架构，该架构使用空间金字塔池化（SPP，也称为空间金字塔匹配）方法来聚合来自CNN卷积层的局部特征。我们使用MFCC，Mel Energy和频谱图这三个众所周知的音频功能来表示音频内容，并在DCASE 2018声学场景性能数据集上评估我们提出的CNN-SPP体系结构的有效性。我们的结果表明，提出的具有频谱图功能的CNN-SPP体系结构提高了分类精度。

著录项

来源
《IEEE International Conference on Semantic Computing》|2019年|128-131|共4页
会议地点
作者
Ahmet Melih Basbug; Mustafa Sert;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Task analysis; Spectrogram; Computer architecture; Image analysis; Feature extraction; Mel frequency cepstral coefficient;

机译：任务分析;频谱图;计算机体系结构;图像分析;特征提取;梅尔频率倒谱系数;
入库时间 2022-08-26 13:53:16

相似文献

外文文献
中文文献
专利

1. An optimized convolutional neural network with bottleneck and spatial pyramid pooling layers for classification of foods [J] . Jahani Heravi Elnaz, Habibi Aghdam Hamed, Puig Domenec Pattern recognition letters . 2018,第APRa1期

机译：具有瓶颈和空间金字塔池层的优化卷积神经网络，用于食品分类
2. Vehicle detection from high-resolution aerial images using spatial pyramid pooling-based deep convolutional neural networks [J] . Qu Tao, Zhang Quanyuan, Sun Shilei Multimedia Tools and Applications . 2017,第20期

机译：使用基于空间金字塔池的深度卷积神经网络从高分辨率航空图像中进行车辆检测
3. A novel acoustic scene classification model using the late fusion of convolutional neural networks and different ensemble classifiers [J] . Alamir Mahmoud A. Applied Acoustics . 2021,第Apra期

机译：一种新的声学场景分类模型，使用卷积神经网络的后期融合和不同的集合分类
4. Acoustic Scene Classification Using Spatial Pyramid Pooling with Convolutional Neural Networks [C] . Ahmet Melih Basbug, Mustafa Sert IEEE International Conference on Semantic Computing . 2019

机译：使用空间金字塔汇集与卷积神经网络的声学场景分类
5. Spatial Resolution Impacts on Deep Convolutional Neural Networks Performance of Land Cover Classification [D] . He, Liu. 2019

机译：空间分辨率对土地覆盖分类的深度卷积神经网络的影响
6. Group and Shuffle Convolutional Neural Networks with Pyramid Pooling Module for Automated Pterygium Segmentation [O] . Siti Raihanah Abdani, Mohd Asyraf Zulkifley, Nuraisyah Hani Zulkifley 2021

机译：组和随机卷积神经网络采用金字塔汇集模块用于自动翼状胬肉细分
7. A Spatial Pyramid Pooling-Based Deep Convolutional Neural Network for the Classification of Electrocardiogram Beats [O] . Jia Li, Yujuan Si, Liuqi Lang, 2018

机译：基于空间金字塔池的深度卷积神经网络，用于心电图拍摄分类

Acoustic Scene Classification Using Spatial Pyramid Pooling with Convolutional Neural Networks

摘要

著录项

相似文献

相关主题

期刊订阅