Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

Adavanne Sharath; Politis Archontis; Nikunen Joonas; Virtanen Tuomas

首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

【24h】

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

机译：使用卷积递归神经网络进行声音事件定位和重叠源检测

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3-D) space. The proposed network takes a sequence of consecutive spectrogram time frames as input and maps it to two outputs in parallel. As the first output, the sound event detection (SED) is performed as a multi-label classification task on each time frame producing temporal activity for all the sound event classes. As the second output, localization is performed by estimating the 3-D Cartesian coordinates of the direction-of-arrival (DOA) for each sound event class using multi-output regression. The proposed method is able to associate multiple DOAs with respective sound event labels and further track this association with respect to time. The proposed method uses separately the phase and magnitude component of the spectrogram calculated on each audio channel as the feature, thereby avoiding any method- and array-specific feature extraction. The method is evaluated on five Ambisonic and two circular array format datasets with different overlapping sound events in anechoic, reverberant, and real-life scenarios. The proposed method is compared with two SED, three DOA estimation, and one SELD baselines. The results show that the proposed method is generic and applicable to any array structures, robust to unseen DOA values, reverberation, and low SNR scenarios. The proposed method achieved a consistently higher recall of the estimated number of DOAs across datasets in comparison to the hest baseline. Additionally, this recall was observed to he significantly better than the best baseline method for a higher number of overlapping sound events.

机译：在本文中，我们提出了一种卷积递归神经网络，用于在三维（3-D）空间中对多个重叠声音事件进行联合声音事件定位和检测（SELD）。拟议的网络将一系列连续的频谱图时间帧作为输入，并将其并行映射到两个输出。作为第一输出，声音事件检测（SED）作为每个时间帧上的多标签分类任务执行，从而为所有声音事件类别产生时间活动。作为第二个输出，通过使用多输出回归估计每个声音事件类别的到达方向（DOA）的3-D笛卡尔坐标来执行定位。所提出的方法能够将多个DOA与相应的声音事件标签相关联，并且相对于时间进一步跟踪该关联。所提出的方法将在每个音频通道上计算的频谱图的相位和幅度分量分别用作特征，从而避免了任何方法和特定于阵列的特征提取。在无回声，混响和真实场景中，对五个具有不同重叠声音事件的Ambisonic和两个圆形阵列格式数据集评估了该方法。将该方法与两个SED，三个DOA估计和一个SELD基线进行了比较。结果表明，所提出的方法是通用的，适用于任何阵列结构，对看不见的DOA值，混响和低SNR场景均具有鲁棒性。与最初的基准相比，所提出的方法在数据集中估计的DOA数量始终保持较高的召回率。此外，对于大量重叠的声音事件，这种召回效果明显优于最佳基准方法。

著录项

来源
《Selected Topics in Signal Processing, IEEE Journal of》 |2019年第1期|34-48|共15页
作者
Adavanne Sharath; Politis Archontis; Nikunen Joonas; Virtanen Tuomas;
展开▼
作者单位

Tampere Univ Technol, Signal Proc Lab, Tampere 33720, Finland;

Aalto Univ, Dept Signal Proc & Acoust, Espoo 02150, Finland;

Tampere Univ Technol, Signal Proc Lab, Tampere 33720, Finland;

Tampere Univ Technol, Signal Proc Lab, Tampere 33720, Finland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Sound event detection; direction of arrival estimation; convolutional recurrent neural network;

机译：声音事件检测;到达方向估计;卷积经常性神经网络;

相似文献

外文文献
中文文献
专利

1. Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks [J] . Adavanne Sharath, Politis Archontis, Nikunen Joonas, Selected Topics in Signal Processing, IEEE Journal of . 2019,第1期

机译：使用卷积经常性神经网络定位和检测重叠源的定位
2. Polyphonic Sound Event Detection Based on Residual Convolutional Recurrent Neural Network With Semi-Supervised Loss Function [J] . Nam Kyun Kim, Hong Kook Kim Quality Control, Transactions . 2021,第1期

机译：基于半监控损失函数的残余卷积复发性神经网络的复音声事件检测
3. Convolutional recurrent neural networks with multi-sized convolution filters for sound-event recognition [J] . Huang Feizhen, Zeng Jinfang, Zhang Yu, Modern Physics Letters, B. Condensed Matter Physics, Statistical Physics, Applied Physics . 2020,第23期

机译：具有多尺寸卷积滤波器的卷积经常性神经网络，用于声音事件识别
4. Sound Event Localization and Detection Using Convolutional Recurrent Neural Networks and Gated Linear Units [C] . Tatsuya Komatsu, Masahito Togami, Tsubasa Takahashi European Signal Processing Conference . 2020

机译：使用卷积经常性神经网络和门控线性单元的声音事件定位和检测
5. Convolutional and recurrent neural networks for pedestrian detection [D] . Balaji, Vivek Arvind. 2016

机译：用于行人检测的卷积和经常性神经网络
6. Convolutional Recurrent Neural Network-Based Event Detection in Tunnels Using Multiple Microphones [O] . Nam Kyun Kim, Kwang Myung Jeon, Hong Kook Kim 2019

机译：基于多麦克风的基于卷积递归神经网络的隧道事件检测
7. Hierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholds [O] . Sotirios Panagiotis Chytas, Gerasimos Potamianos 2019

机译：使用具有自适应阈值的卷积神经网络分层检测声音事件及其本地化

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅