首页> 外文会议>European Signal Processing Conference >Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training
【24h】

Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training

机译:Hodge和Podge:混合监督声音事件检测,具有多热混音和组成一致培训

获取原文

摘要

In this paper, we propose a method called Hodge and Podge for sound event detection. We demonstrate Hodge and Podge on the dataset of Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Challenge Task 4. This task aims to predict the presence or absence and the onset and offset times of sound events in home environments. Sound event detection is challenging due to the lack of large scale real strongly labeled data. Recently deep semi-supervised learning (SSL) has proven to be effective in modeling with weakly labeled and unlabeled data. This work explores how to extend deep SSL to result in a new, state-of-the-art sound event detection method called Hodge and Podge. With convolutional recurrent neural networks (CRNN) as the backbone network, first, a multi-scale squeeze-excitation mechanism is introduced and added to generate a pyramid squeeze-excitation CRNN. The pyramid squeeze-excitation layer can pay attention to the issue that different sound events have different durations, and to adaptively recalibrate channel-wise spectrogram responses. Further, in order to remedy the lack of real strongly labeled data problem, we propose multi-hot MixMatch and composition consistency training with temporal-frequency augmentation. Our experiments with the public DCASE2019 challenge task 4 validation data resulted in an event-based F-score of 43.4%, and is about absolutely 1.6% better than state-of-the-art methods in the challenge. While the F-score of the official baseline is 25.8%.
机译:在本文中,我们提出了一种称为Hodge和Podge的方法,用于声音事件检测。我们展示了关于声学场景的检测和分类数据集的霍奇和阴影(DCASE)2019挑战任务4.这项任务旨在预测家庭环境中的声音事件的存在或缺失和偏移时间。由于缺乏大规模实际标记的数据,声音事件检测是挑战。最近,深入的半监督学习(SSL)已被证明在使用弱标记和未标记的数据建模方面有效。这项工作探讨了如何扩展Deep SSL,以导致新的最先进的声音检测方法,称为Hodge和Podge。随着卷积复发性神经网络(CRNN)作为骨干网,首先,引入了多尺度挤压励磁机构并添加以产生金字塔挤压激励CRNN。金字塔挤压励磁层可以注意不同声音事件具有不同持续时间的问题,并自适应地重新校准通道 - 方向谱图响应。此外,为了弥补缺乏真正的强烈标记的数据问题,我们提出了用时间频率增强的多热混合和组成一致性训练。我们的实验与公共DCES2019挑战任务4验证数据导致基于事件的F分数为43.4%,而不是挑战中最先进的方法更好的1.6%。虽然官方基线的F分数为25.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号