...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Duration-Controlled LSTM for Polyphonic Sound Event Detection
【24h】

Duration-Controlled LSTM for Polyphonic Sound Event Detection

机译:持续时间控制的LSTM用于复音声音事件检测

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper presents a new hybrid approach called duration-controlled long short-term memory (LSTM) for polyphonic sound event detection (SED). It builds upon a state-of-the-art SED method that performs frame-by-frame detection using a bidirectional LSTM recurrent neural network (BLSTM), and incorporates a duration-controlled modeling technique based on a hidden semi-Markov model. The proposed approach makes it possible to model the duration of each sound event precisely and to perform sequence-by-sequence detection without having to resort to thresholding, as in conventional frame-by-frame methods. Furthermore, to effectively reduce sound event insertion errors, which often occur under noisy conditions, we also introduce a binary-mask-based postprocessing that relies on a sound activity detection network to identify segments with any sound event activity, an approach inspired by the well-known benefits of voice activity detection in speech recognition systems. We conduct an experiment using the DCASE2016 task 2 dataset to compare our proposed method with typical conventional methods, such as nonnegative matrix factorization and standard BLSTM. Our proposed method outperforms the conventional methods both in an event-based evaluation, achieving a 75.3% F1 score and a 44.2% error rate, and in a segment-based evaluation, achieving an 81.1% F1 score, and a 32.9% error rate, outperforming the best results reported in the DCASE2016 task 2 Challenge.
机译:本文提出了一种新的混合方法,称为持续时间控制的长期短期记忆(LSTM),用于复音声音事件检测(SED)。它基于最先进的SED方法,该方法使用双向LSTM递归神经网络(BLSTM)进行逐帧检测,并结合了基于隐藏半马氏模型的持续时间控制的建模技术。所提出的方法使得可以精确地对每个声音事件的持续时间进行建模,并且可以执行逐序列检测,而不必像传统的逐帧方法那样依靠阈值。此外,为了有效地减少通常在嘈杂条件下发生的声音事件插入错误,我们还引入了基于二进制掩码的后处理,该后处理依赖于声音活动检测网络来识别具有任何声音事件活动的片段,这是受油井启发的方法。语音识别系统中语音活动检测的已知好处。我们使用DCASE2016任务2数据集进行了一项实验,以将我们提出的方法与典型的常规方法(例如非负矩阵分解和标准BLSTM)进行比较。我们提出的方法在基于事件的评估中,F1得分达到75.3%,错误率达到44.2%,在基于细分的评估中,F1得分达到81.1%,错误率达到32.9%,优于传统方法,胜过DCASE2016任务2挑战中报告的最佳结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号