首页> 外文会议>International Joint Conference on Neural Networks >Bag-of-Deep-Features: Noise-Robust Deep Feature Representations for Audio Analysis
【24h】

Bag-of-Deep-Features: Noise-Robust Deep Feature Representations for Audio Analysis

机译:深度功能包:用于音频分析的鲁棒的深度功能表示

获取原文

摘要

In the era of deep learning, research into the classification of various components of the acoustic environment, especially in-the-wild recordings, is gaining in popularity. This is due in part to the increasing computational capacities and the expanding amount of real-world data available on social multimedia. However, the noisy nature of this data can add an additional complexity to the already complex deep learning systems. Herein, we tackle this issue by quantising deep feature representations of various in-the-wild audio data sets. The aim of this paper is twofold: 1) to assess the feasibility of the proposed feature quantisation task, and 2) to compare the efficacy of various feature spaces extracted from different fully connected deep neural networks to classify six real-world audio corpora. For the classification, we extract two feature sets: i) DEEP SPECTRUM features which are derived from forwarding the visual representations of the audio instances, in particular mel-spectrograms through very deep task-independent pre-trained Convolutional Neural Networks (CNNs), and ii) Bag-of-Deep-Features (BODF) which is the quantisation of the DEEP SPECTRUM features. Using BODF, we show the suitability of quantising the deep representations for noisy in-the-wild audio data. Finally, we analyse the effect of early and late fusion of the CNN features and models on the classification results.
机译:在深度学习时代,对声学环境的各个组成部分(尤其是在野外录音中)的分类的研究正在日益普及。这部分是由于计算能力的提高和社交多媒体上可用的现实世界数据数量的增加。但是,此数据的嘈杂性质会给已经很复杂的深度学习系统增加额外的复杂性。本文中,我们通过量化各种野生音频数据集的深层特征表示来解决此问题。本文的目的是双重的:1)评估提出的特征量化任务的可行性,以及2)比较从不同的完全连接的深度神经网络提取的各种特征空间的功效,以对六个现实世界的音频语料库进行分类。对于分类,我们提取两个特征集:i)DEEP SPECTRUM特征,这些特征是通过非常深的任务独立的预训练卷积神经网络(CNN)转发音频实例的视觉表示而得出的,特别是Mel声谱图,以及ii)深度特征包(BODF),它是DEEP SPECTRUM特征的量化。使用BODF,我们展示了对嘈杂的音频数据进行深度表示量化的适用性。最后,我们分析了CNN特征和模型的早期和晚期融合对分类结果的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号