首页> 外文期刊>Digital Signal Processing >A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features
【24h】

A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features

机译:一种新的基于方差的声谱图特征在机器听力分类中的歧视性特征提取方法

获取原文
获取原文并翻译 | 示例
           

摘要

Machine hearing is an emerging research field that is analogous to machine vision in that it aims to equip computers with the ability to hear and recognise a variety of sounds. It is a key enabler of natural human-computer speech interfacing, as well as in areas such as automated security surveillance, environmental monitoring, smart homes/buildings/cities. Recent advances in machine learning allow current systems to accurately recognise a diverse range of sounds under controlled conditions. However doing so in real-world noisy conditions remains a challenging task. Several front-end feature extraction methods have been used for machine hearing, employing speech recognition features like MFCC and PLP, as well as image-like features such as AIM and SIF. The best choice of feature is found to be dependent upon the noise environment and machine learning techniques used. Machine learning methods such as deep neural networks have been shown capable of inferring discriminative classification rules from less structured front-end features in related domains. In the machine hearing field, spectrogram image features have recently shown good performance for noise-corrupted classification using deep neural networks. However there are many methods of extracting features from spectrograms. This paper explores a novel data-driven feature extraction method that uses variance-based criteria to define spectral pooling of features from spectrograms. The proposed method, based on maximising the pooled spectral variance of foreground and background sound models, is shown to achieve very good performance for robust classification. (C) 2016 Elsevier Inc. All rights reserved.
机译:机器听力是一个与机器视觉类似的新兴研究领域,它旨在使计算机具备听到和识别各种声音的能力。它是自然人机语音接口以及自动化安防监控,环境监控,智能家居/建筑物/城市等领域的关键推动力。机器学习的最新进展允许当前的系统在受控条件下准确识别各种声音。然而,在现实的嘈杂条件下这样做仍然是一项艰巨的任务。几种前端特征提取方法已用于机器听力,它采用了语音识别功能(例如MFCC和PLP)以及类似图像的功能(例如AIM和SIF)。发现功能的最佳选择取决于噪声环境和所使用的机器学习技术。已经显示出诸如深度神经网络之类的机器学习方法,能够从相关领域中结构化程度较低的前端特征中推断出判别性分类规则。在机器听力领域,频谱图图像特征最近显示了使用深度神经网络进行噪声损坏分类的良好性能。但是,有许多方法可以从频谱图中提取特征。本文探索了一种新颖的数据驱动特征提取方法,该方法使用基于方差的标准来定义光谱图中特征的光谱池。所提出的方法在最大化前景和背景声音模型的合并频谱方差的基础上,显示出对于鲁棒分类具有非常好的性能。 (C)2016 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号