首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >CNN-LTE: A class of 1-X pooling convolutional neural networks on label tree embeddings for audio scene classification
【24h】

CNN-LTE: A class of 1-X pooling convolutional neural networks on label tree embeddings for audio scene classification

机译:CNN-LTE:在标签树嵌入中用于音频场景分类的一类1-X池卷积神经网络

获取原文

摘要

We present in this work an approach for audio scene classification. Firstly, given the label set of the scenes, a label tree is automatically constructed where the labels are grouped into meta-classes. This category taxonomy is then used in the feature extraction step in which an audio scene instance is transformed into a label tree embedding image. Elements of the image indicate the likelihoods that the scene instances belong to different meta-classes. A class of simple 1-X (i.e. 1-max, 1-mean, and 1-mix) pooling convolutional neural networks, which are tailored for the task at hand, are finally learned on top of the image features for scene recognition. Experimental results on the DCASE 2013 and DCASE 2016 datasets demonstrate the efficiency of the proposed method.
机译:我们在这项工作中提出了一种音频场景分类的方法。首先,给定场景的标签集,将自动构建标签树,其中将标签分组为元类。然后,在类别提取步骤中使用此类别分类法,在该步骤中,音频场景实例将转换为标签树嵌入图像。图像的元素指示场景实例属于不同的元类的可能性。最终在图像特征之上学习场景识别的一类简单的1-X(即1-max,1-mean和1-mix)池卷积神经网络。在DCASE 2013和DCASE 2016数据集上的实验结果证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号