Classification of audio scenes with novel features in a fused system framework

Waldekar Shefali; Saha Goutam

首页> 外文期刊>Digital Signal Processing >Classification of audio scenes with novel features in a fused system framework

【24h】

Classification of audio scenes with novel features in a fused system framework

机译：融合系统框架中具有新功能的音频场景分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The rapidly increasing requirements from context-aware gadgets, like smartphones and intelligent wearable devices, along with applications such as audio archiving, have given a fillip to the research in the field of Acoustic Scene Classification (ASC). The Detection and Classification of Acoustic Scenes and Events (DCASE) challenges have seen systems addressing the problem of ASC from different directions. Some of them could achieve better results than the Mel Frequency Cepstral Coefficients - Gaussian Mixture Model (MFCC-GMM) baseline system. However, a collective decision from all participating systems was found to surpass the accuracy obtained by each system. The simultaneous use of various approaches can exploit the discriminating information in a better way for audio collected from different environments covering audible-frequency range in varying degrees. In this work, we show that the frame level statistics of some well-known spectral features when fed to Support Vector Machine (SVM) classifier individually, are able to outperform the baseline system of DCASE challenges. Furthermore, we analyzed different methods of combining these features, and also of combining information from two channels when the data is in binaural format. The proposed approach resulted in around 17% and 9% relative improvement in accuracy with respect to the baseline system on the development and evaluation dataset, respectively, from DCASE 2016 ASC task. (C) 2018 Elsevier Inc. All rights reserved.

机译：从上下文知识的小工具（如智能手机和智能可穿戴设备）以及音频存档等应用程序的快速增长要求给出了声场分类（ASC）领域的研究。声学场景和事件（DCASE）挑战的检测和分类已经看到了解决来自不同方向的ASC问题的系统。其中一些可以达到比MEL频率谱系数更好的结果 - 高斯混合模型（MFCC-GMM）基线系统。然而，发现所有参与系统的集体决定超过了每个系统获得的准确性。同时使用各种方法可以以更好的方式利用区分信息，以便从覆盖不同程度的不同环境中收集的音频。在这项工作中，我们表明，一定众所周知的频谱特征的帧级统计，当馈送到支持向量机（SVM）分类器时，能够优于DCES挑战的基线系统。此外，我们分析了组合这些特征的不同方法，以及当数据处于双耳格式时与两个通道组合的信息相结合。所提出的方法分别从DCEAC 2016 ASC任务中分别对开发和评估数据集的基线系统的准确性提高约17％和9％。（c）2018年Elsevier Inc.保留所有权利。

著录项

来源
《Digital Signal Processing》 |2018年第2018期|共12页
作者
Waldekar Shefali; Saha Goutam;
展开▼
作者单位

IIT Kharagpur Dept Elect &

Elect Commun Engn Kharagpur W Bengal India;

IIT Kharagpur Dept Elect &

Elect Commun Engn Kharagpur W Bengal India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类数字信号处理;
关键词
Block-based MFCC; CQCC; Environmental sounds; Fusion; Machine listening; SCFC;

机译：基于块的MFCC;CQCC;环境声音;融合;机器听力;SCFC;

相似文献

外文文献
中文文献
专利

1. Classification of audio scenes with novel features in a fused system framework [J] . Waldekar Shefali, Saha Goutam Digital Signal Processing . 2018,第期

机译：融合系统框架中具有新功能的音频场景分类
2. A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification [J] . Han Wei, Feng Ruyi, Wang Lizhe, ISPRS Journal of Photogrammetry and Remote Sensing . 2018,第NOVa期

机译：具有深度学习功能的半监督生成框架，用于高分辨率遥感影像场景分类
3. Dense Connectivity Based Two-Stream Deep Feature Fusion Framework for Aerial Scene Classification [J] . Yunlong Yu, Fuxian Liu Remote Sensing . 2018,第7期

机译：基于密集连接的两流深度特征融合框架用于空中场景分类
4. Deep Feature Embedding and Hierarchical Classification for Audio Scene Classification [C] . Lam Pham, Ian McLoughlin, Huy Phan, International Joint Conference on Neural Networks . 2020

机译：音频场景分类的深度特征嵌入和层次分类
5. Automation of Feature Selection and Generation of Optimal Feature Subsets for Beehive Audio Sample Classification [D] . Bhouraskar, Aditya. 2020

机译：蜂箱音频样本分类的特征选择和最佳特征子集的生成
6. Multi-Scale Spatial Concatenations of Local Features in Natural Scenes and Scene Classification [O] . Xiaoyuan Zhu, Zhiyong Yang -1

机译：自然场景和场景分类中局部特征的多尺度空间级联
7. Deep Multi-view Features from Raw Audio for Acoustic Scene Classification [O] . Arshdeep Singh, Padmanabhan Rajan, Arnav Bhavsar 2019

机译：来自原始音频的深度多视图功能，用于声学场景分类

Classification of audio scenes with novel features in a fused system framework

摘要

著录项

相似文献

相关主题

期刊订阅