Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data

Kong Qiuqiang; Xu Yong; Sobieraj Iwona; Wang Wenwu; Plumbley Mark D.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data

【24h】

Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data

机译：从弱标签数据中进行声音事件检测和时频分割

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Sound event detection (SED) aims to detect when and recognize what sound events happen in an audio clip. Many supervised SED algorithms rely on strongly labelled data that contains the onset and offset annotations of sound events. However, many audio tagging datasets are weakly labelled, that is, only the presence of the sound events is known, without knowing their onset and offset annotations. In this paper, we propose a time-frequency (T-F) segmentation framework trained on weakly labelled data to tackle the sound event detection and separation problem. In training, a segmentation mapping is applied on a T-F representation, such as log mel spectrogram of an audio clip to obtain T-F segmentation masks of sound events. The T-F segmentation masks can be used for separating the sound events from the background scenes in the T-F domain. Then, a classification mapping is applied on the T-F segmentation masks to estimate the presence probabilities of the sound events. We model the segmentation mapping using a convolutional neural network and the classification mapping using a global weighted rank pooling. In SED, predicted onset and offset times can be obtained from the T-F segmentation masks. As a byproduct, separated waveforms of sound events can be obtained from the T-F segmentation masks. We remixed the DCASE 2018 Task 1 acoustic scene data with the DCASE 2018 Task 2 sound events data. When mixing under 0 dB, the proposed method achieved F1 scores of 0.534,0.398, and 0.167 in audio tagging, frame-wise SED and event-wise SED, outperforming the fully connected deep neural network baseline of 0.331,0.237, and 0.120, respectively. In T-F segmentation, we achieved an F1 score of 0.218, where previous methods were not able to do T-F segmentation.

机译：声音事件检测（SED）旨在检测音频剪辑中何时发生声音事件并识别出声音事件。许多受监督的SED算法都依赖于带有强烈标签的数据，这些数据包含声音事件的开始和偏移注释。但是，许多音频标记数据集的标签很微弱，也就是说，仅知道声音事件的存在，而不知道它们的开始和偏移注释。在本文中，我们提出了一种在弱标记数据上训练的时频（T-F）分割框架，以解决声音事件检测和分离问题。在训练中，将分段映射应用于T-F表示（例如音频剪辑的log mel声谱图），以获得声音事件的T-F分段掩码。 T-F分割蒙版可用于将声音事件与T-F域中的背景场景分离。然后，将分类映射应用于T-F分割蒙版以估计声音事件的存在概率。我们使用卷积神经网络对分割映射进行建模，并使用全局加权秩池对分类映射进行建模。在SED中，可以从T-F分割蒙版中获得预测的开始和偏移时间。作为副产品，可以从T-F分割蒙版中获得声音事件的分离波形。我们将DCASE 2018 Task 1声音场景数据与DCASE 2018 Task 2声音事件数据进行了混合。当在0 dB以下混合时，所提出的方法在音频标记，逐帧SED和逐事件SED中获得的F1分数分别为0.534、0.398和0.167，分别优于完全连接的深度神经网络基线0.331、0.237和0.120。。在T-F细分中，我们的F1得分为0.218，而以前的方法无法进行T-F细分。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2019年第4期|777-787|共11页
作者
Kong Qiuqiang; Xu Yong; Sobieraj Iwona; Wang Wenwu; Plumbley Mark D.;
展开▼
作者单位

Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England;

Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England;

Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England;

Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England;

Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Sound event detection; time-frequency segmentation; weakly labelled data; convolutional neural network;

机译：声音事件检测时频分割弱标记数据卷积神经网络;

相似文献

外文文献
中文文献
专利

1. Adaptive Pooling Operators for Weakly Labeled Sound Event Detection [J] . Brian McFee, Justin Salamon, Juan Pablo Bello Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2018,第11期

机译：弱标记声音事件检测的自适应池算子
2. Spot event detection along a large-scale sensor based on ultra-weak fiber Bragg gratings using time-frequency analysis [J] . Ricchiuti Amelia Lavinia, Sales Salvador Applied optics . 2016,第5期

机译：基于时频分析的超弱光纤布拉格光栅在大型传感器上的事件检测
3. Automatic rib segmentation and labeling in computed tomography scans using a general framework for detection, recognition and segmentation of objects in volumetric data. [J] . Staal J, van-Ginneken B, Viergever MA Medical image analysis . 2007,第1期

机译：使用用于检测，识别和分割体积数据中对象的通用框架，在计算机断层扫描中自动进行肋骨分割和标记。
4. A Joint Separation-Classification Model for Sound Event Detection of Weakly Labelled Data [C] . Qiuqiang Kong, Yong Xu, Wenwu Wang, IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：弱标签数据声音事件检测的联合分离分类模型
5. Estimation of carrier frequency offset, channel state information and data detection using hybrid Kalman filter and expectation maximization in time-varying environment. [D] . Wu, Hsin-Chang. 2008

机译：在时变环境中使用混合卡尔曼滤波器估计载波频率偏移，信道状态信息和数据，并实现期望最大化。
6. Weakly Labeled Data Augmentation for Deep Learning: A Study on COVID-19 Detection in Chest X-Rays [O] . Sivaramakrishnan Rajaraman, Sameer Antani 2020

机译：深度学习的弱标记数据增强：胸部X射线中COVID-19检测的研究
7. Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data [O] . Qiuqiang Kong, Yong Xu, Iwona Sobieraj, 2019

机译：来自弱标记数据的声音事件检测和时间频率分割

Sound Event Detection and Time–Frequency Segmentation from Weakly Labelled Data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅