首页> 外文期刊>Applied Acoustics >Two-level fusion-based acoustic scene classification
【24h】

Two-level fusion-based acoustic scene classification

机译:基于两级融合的声学场景分类

获取原文
获取原文并翻译 | 示例

摘要

Growing demands from applications like surveillance, archiving, and context-aware devices have fuelled research towards efficient extraction of useful information from environmental sounds. Assigning a textual label to an audio segment based on the general characteristics of locations or situations is dealt with in acoustic scene classification (ASC). Because of the different nature of audio scenes, a single feature-classifier pair may not efficiently discriminate among environments. Also, the acoustic scenes might vary with the problem under investigation. However, for most of the ASC applications, rather than giving explicit scene labels (like home, park, etc.) a general estimate of the type of surroundings (e.g., indoor or outdoor) might be enough. In this paper, we propose a two-level hierarchical framework for ASC wherein finer labels follow coarse classification. At the first level, texture features extracted from time-frequency representation of the audio samples are used to generate the coarse labels. The system then explores combinations of six well-known spectral features, successfully used in different audio processing fields for second level classification to give finer details of the audio scene. The performance of the proposed system is compared with baseline methods using detection and classification of acoustic scenes and events (DCASE, 2016 and 2017) ASC databases, and found to be superior in terms of classification accuracy. Additionally, the proposed hierarchical method provides important intermediate results as coarse labels that may be useful in certain applications. (C) 2020 Elsevier Ltd. All rights reserved.
机译:从监视,归档和环境知识设备等应用程序的需求增长促进了从环境声音有效提取有用信息的研究。在声学场景分类(ASC)中,将基于位置或情况的一般特征分配给音频段的文本标签。由于音频场景的不同性质,单个特征分类器对可能在环境中有效地区分。此外,声学场景可能因调查问题而异。然而,对于大多数ASC应用,而不是给出明确的场景标签(如家庭,公园等)的一般估计周围的类型(例如,室内或室外)可能就足够了。在本文中,我们为ASC提出了一个双层分层框架,其中更精细的标签遵循粗略分类。在第一级别,从音频样本的时频表示中提取的纹理特征用于生成粗标签。然后,系统探讨了六种众所周知的频谱特征的组合,以用于第二级分类的不同音频处理字段中,以提供更精细的音频场景细节。使用声学场景和事件的检测和分类(DCASE,2016和2017)ASC数据库的检测和分类,将所提出的系统的性能与基线方法进行比较,并发现在分类准确性方面是优越的。另外,所提出的分层方法将重要的中间结果提供作为在某些应用中可能有用的粗标记。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号