首页> 外文会议>International Joint Conference on Neural Networks >End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input
【24h】

End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input

机译:使用带学习的时频表示输入的卷积递归神经网络进行端到端复音声音事件检测

获取原文

摘要

Sound event detection systems typically consist of two stages: extracting hand-crafted features from the raw audio waveform, and learning a mapping between these features and the target sound events using a classifier. Recently, the focus of sound event detection research has been mostly shifted to the latter stage using standard features such as mel spectrogram as the input for classifiers such as deep neural networks. In this work, we utilize end-to-end approach and propose to combine these two stages in a single deep neural network classifier. The feature extraction over the raw waveform is conducted by a feedforward layer block, whose parameters are initialized to extract the time-frequency representations. The feature extraction parameters are updated during training, resulting with a representation that is optimized for the specific task. This feature extraction block is followed by (and jointly trained with) a convolutional recurrent network, which has recently given state-of-the-art results in many sound recognition tasks. The proposed system does not outperform a convolutional recurrent network with fixed hand-crafted features. The final magnitude spectrum characteristics of the feature extraction block parameters indicate that the most relevant information for the given task is contained in 0 - 3 kHz frequency range, and this is also supported by the empirical results on the SED performance.
机译:声音事件检测系统通常包括两个阶段:从原始音频波形中提取手工制作的特征,以及使用分类器学习这些特征与目标声音事件之间的映射。最近,声音事件检测研究的重点已大部分转移到了后期,使用标准特征(如梅尔声谱图)作为分类器(如深度神经网络)的输入。在这项工作中,我们采用了端到端方法,并提出将这两个阶段组合在一个单独的深度神经网络分类器中。通过前馈层模块对原始波形进行特征提取,其参数被初始化以提取时频表示。特征提取参数在训练过程中进行更新,从而得到针对特定任务进行了优化的表示形式。这个特征提取模块之后是一个卷积递归网络(并与之一起训练),该卷积递归网络最近在许多声音识别任务中提供了最新技术成果。拟议的系统不优于具有固定手工特征的卷积递归网络。特征提取块参数的最终幅度谱特征表明,与给定任务最相关的信息包含在0-3 kHz频率范围内,而SED性能的经验结果也支持这一点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号