Bag-of-Deep-Features: Noise-Robust Deep Feature Representations for Audio Analysis

机译：深度功能包：用于音频分析的鲁棒的深度功能表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the era of deep learning, research into the classification of various components of the acoustic environment, especially in-the-wild recordings, is gaining in popularity. This is due in part to the increasing computational capacities and the expanding amount of real-world data available on social multimedia. However, the noisy nature of this data can add an additional complexity to the already complex deep learning systems. Herein, we tackle this issue by quantising deep feature representations of various in-the-wild audio data sets. The aim of this paper is twofold: 1) to assess the feasibility of the proposed feature quantisation task, and 2) to compare the efficacy of various feature spaces extracted from different fully connected deep neural networks to classify six real-world audio corpora. For the classification, we extract two feature sets: i) DEEP SPECTRUM features which are derived from forwarding the visual representations of the audio instances, in particular mel-spectrograms through very deep task-independent pre-trained Convolutional Neural Networks (CNNs), and ii) Bag-of-Deep-Features (BODF) which is the quantisation of the DEEP SPECTRUM features. Using BODF, we show the suitability of quantising the deep representations for noisy in-the-wild audio data. Finally, we analyse the effect of early and late fusion of the CNN features and models on the classification results.

机译：在深度学习时代，对声学环境的各个组成部分（尤其是在野外录音中）的分类的研究正在日益普及。这部分是由于计算能力的提高和社交多媒体上可用的现实世界数据数量的增加。但是，此数据的嘈杂性质会给已经很复杂的深度学习系统增加额外的复杂性。本文中，我们通过量化各种野生音频数据集的深层特征表示来解决此问题。本文的目的是双重的：1）评估提出的特征量化任务的可行性，以及2）比较从不同的完全连接的深度神经网络提取的各种特征空间的功效，以对六个现实世界的音频语料库进行分类。对于分类，我们提取两个特征集：i）DEEP SPECTRUM特征，这些特征是通过非常深的任务独立的预训练卷积神经网络（CNN）转发音频实例的视觉表示而得出的，特别是Mel声谱图，以及ii）深度特征包（BODF），它是DEEP SPECTRUM特征的量化。使用BODF，我们展示了对嘈杂的音频数据进行深度表示量化的适用性。最后，我们分析了CNN特征和模型的早期和晚期融合对分类结果的影响。

著录项

来源
《International Joint Conference on Neural Networks》|2018年|1-7|共7页
会议地点
作者
Shahin Amiriparian; Maurice Gerczuk; Sandra Ottl; Nicholas Cummins; Sergey Pugachevskiy; Björn Schuller;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Feature extraction; Task analysis; Quantization (signal); Spectrogram; Videos; Audio recording; Image color analysis;

机译：特征提取;任务分析;量化（信号）;频谱图;视频;录音;图像色彩分析;

相似文献

外文文献
中文文献
专利

1. A Low-Complexity Parabolic Lip Contour Model With Speaker Normalization for High-Level Feature Extraction in Noise-Robust Audiovisual Speech Recognition [J] . Borgstrom B.J., Alwan A. IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans . 2008,第6期

机译：具有说话人归一化功能的低复杂度抛物线形嘴唇轮廓模型，用于噪声鲁棒的视听语音识别中的高级特征提取
2. Feature-Level Change Detection Using Deep Representation and Feature Change Analysis for Multispectral Imagery [J] . Hui Zhang, Maoguo Gong, Puzhao Zhang, IEEE Geoscience and Remote Sensing Letters . 2016,第11期

机译：使用深度表示的特征级别变化检测和多光谱图像的特征变化分析
3. An effective analysis of deep learning based approaches for audio based feature extraction and its visualization [J] . Dhiraj, Biswas Rohit, Ghattamaraju Nischay Multimedia Tools and Applications . 2019,第17期

机译：基于深度学习的音频特征提取方法及其可视化的有效分析
4. Bag-of-Deep-Features: Noise-Robust Deep Feature Representations for Audio Analysis [C] . Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, International Joint Conference on Neural Networks . 2018

机译：袋 - 深度特点：音频分析的噪声强大的深度特征表示
5. Advanced Music Audio Feature Learning with Deep Networks. [D] . Daigneau, Madeleine. 2017

机译：借助深度网络进行高级音乐音频功能学习。
6. Feature Representations for Neuromorphic Audio Spike Streams [O] . Jithendar Anumula, Daniel Neil, Tobi Delbruck, 2018

机译：神经形态音频尖峰流的功能表示
7. Automated Recognition of Alzheimer’s Dementia Using Bag-of-Deep-Features and Model Ensembling [O] . Zafi Sherhan Syed, Muhammad Shehram Shah Syed, Margaret Lech, 2021

机译：自动识别阿尔茨海默痴呆症使用深层功能和模型集合

Bag-of-Deep-Features: Noise-Robust Deep Feature Representations for Audio Analysis

摘要

著录项

相似文献

相关主题

期刊订阅