首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
【24h】

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

机译:使用卷积经常性神经网络定位和检测重叠源的定位

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space. The proposed network takes a sequence of consecutive spectrogram time frames as input and maps it to two outputs in parallel. As the first output, the sound event detection (SED) is performed as a multi-label classification task on each time frame producing temporal activity for all the sound event classes. As the second output, localization is performed by estimating the 3-D Cartesian coordinates of the direction-of-arrival (DOA) for each sound event class using multi-output regression. The proposed method is able to associate multiple DOAs with respective sound event labels and further track this association with respect to time. The proposed method uses separately the phase and magnitude component of the spectrogram calculated on each audio channel as the feature, thereby avoiding any method- and array-specific feature extraction. The method is evaluated on five Ambisonic and two circular array format datasets with different overlapping sound events in anechoic, reverberant, and real-life scenarios. The proposed method is compared with two SED, three DOA estimation, and one SELD baselines. The results show that the proposed method is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios. The proposed method achieved a consistently higher recall of the estimated number of DOAs across datasets in comparison to the hest baseline. Additionally, this recall was observed to he significantly better than the best baseline method for a higher number of overlapping sound events.
机译:在本文中,我们提出了一种卷积复制神经网络,用于三维(3-D)空间中多重叠声音事件的联合声音事件定位和检测(SELD)。所提出的网络将一系列连续的频谱图时间框架作为输入,并并行地将其映射到两个输出。作为第一个输出,声音事件检测(SED)在每个时间帧上作为所有声音事件类的时间活动的多标签分类任务执行。作为第二输出,通过使用多输出回归估计每个声音事件类的到达方向(DOA)的三维笛卡尔坐标来执行定位。所提出的方法能够将多个DOA与各个声音事件标签相关联,并进一步跟踪该关联的时间。所提出的方法单独使用在每个音频信道上计算的频谱图的相位和幅度分量作为特征,从而避免了任何方法和阵列特定的特征提取。该方法是在五个Ambisonic和两个圆形阵列格式数据集上评估,其中包含异常,混响和现实生活方案中的不同重叠声音事件。将该方法与两个SED,三个DOA估计和一个SELD基线进行比较。结果表明,该方法是通用的,适用于任何阵列结构,鲁棒到看不见的DOA值,混响和低SNR场景。与HEST基线相比,所提出的方法在数据集中实现了跨越数据集的估计数量的DOA数。另外,该召回被观察到他比较高数量的重叠声音事件的最佳基线方法更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号