首页> 外文会议>Pacific-Rim conference on multimedia >Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features
【24h】

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

机译:多时相分辨率卷积神经网络结合多层次特征的环境声分类

获取原文

摘要

Motivated by the fact that characteristics of different sound classes are highly diverse in different temporal scales and hierarchical levels, a novel deep convolutional neural network (CNN) architecture is proposed for the environmental sound classification task. This network architecture takes raw waveforms as input, and a set of separated parallel CNNs are utilized with different convolutional filter sizes and strides, in order to learn feature representations with multi-temporal resolutions. On the other hand, the proposed architecture also aggregates hierarchical features from multi-level CNN layers for classification using direct connections between convolutional layers, which is beyond the typical single-level CNN features employed by the majority of previous studies. This network architecture also improves the flow of information and avoids vanishing gradient problem. The combination of multi-level features boosts the classification performance significantly. Comparative experiments are conducted on two datasets: the environmental sound classification dataset (ESC-50), and DCASE 2017 audio scene classification dataset. Results demonstrate that the proposed method is highly effective in the classification tasks by employing multi-temporal resolution and multi-level features, and it outperforms the previous methods which only account for single-level features.
机译:由于不同声音类别的特性在不同的时间尺度和等级层次上具有高度差异的事实,因此提出了一种新颖的深度卷积神经网络(CNN)体系结构来进行环境声音分类任务。该网络体系结构将原始波形作为输入,并使用具有不同卷积滤波器大小和步幅的一组分离的并行CNN,以学习具有多时间分辨率的特征表示。另一方面,提出的体系结构还使用卷积层之间的直接连接聚合了来自多层CNN层的分层特征以进行分类,这超出了大多数先前研究所采用的典型单层CNN特征。这种网络体系结构还改善了信息流,并避免了梯度问题的消失。多级功能的组合大大提高了分类性能。在两个数据集上进行了对比实验:环境声音分类数据集(ESC-50)和DCASE 2017音频场景分类数据集。结果表明,该方法通过采用多时间分辨率和多级特征,在分类任务中非常有效,并且优于仅考虑单级特征的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号