首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
【24h】

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

机译:使用卷积递归神经网络进行声音事件定位和重叠源检测

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space. The proposed network takes a sequence of consecutive spectrogram time frames as input and maps it to two outputs in parallel. As the first output, the sound event detection (SED) is performed as a multi-label classification task on each time frame producing temporal activity for all the sound event classes. As the second output, localization is performed by estimating the 3-D Cartesian coordinates of the direction-of-arrival (DOA) for each sound event class using multi-output regression. The proposed method is able to associate multiple DOAs with respective sound event labels and further track this association with respect to time. The proposed method uses separately the phase and magnitude component of the spectrogram calculated on each audio channel as the feature, thereby avoiding any method- and array-specific feature extraction. The method is evaluated on five Ambisonic and two circular array format datasets with different overlapping sound events in anechoic, reverberant, and real-life scenarios. The proposed method is compared with two SED, three DOA estimation, and one SELD baselines. The results show that the proposed method is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios. The proposed method achieved a consistently higher recall of the estimated number of DOAs across datasets in comparison to the hest baseline. Additionally, this recall was observed to he significantly better than the best baseline method for a higher number of overlapping sound events.
机译:在本文中,我们提出了一种卷积递归神经网络,用于在三维(3-D)空间中对多个重叠声音事件进行联合声音事件定位和检测(SELD)。拟议的网络将一系列连续的频谱图时间帧作为输入,并将其并行映射到两个输出。作为第一输出,声音事件检测(SED)作为每个时间帧上的多标签分类任务执行,从而为所有声音事件类别产生时间活动。作为第二个输出,通过使用多输出回归估计每个声音事件类别的到达方向(DOA)的3-D笛卡尔坐标来执行定位。所提出的方法能够将多个DOA与相应的声音事件标签相关联,并且相对于时间进一步跟踪该关联。所提出的方法将在每个音频通道上计算的频谱图的相位和幅度分量分别用作特征,从而避免了任何方法和特定于阵列的特征提取。在无回声,混响和真实场景中,对五个具有不同重叠声音事件的Ambisonic和两个圆形阵列格式数据集评估了该方法。将该方法与两个SED,三个DOA估计和一个SELD基线进行了比较。结果表明,所提出的方法是通用的,适用于任何阵列结构,对看不见的DOA值,混响和低SNR场景均具有鲁棒性。与最初的基准相比,所提出的方法在数据集中估计的DOA数量始终保持较高的召回率。此外,对于大量重叠的声音事件,这种召回效果明显优于最佳基准方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号