首页> 外文期刊>Neurocomputing >Multi-Scale and Single-Scale Fully Convolutional Networks for Sound Event Detection
【24h】

Multi-Scale and Single-Scale Fully Convolutional Networks for Sound Event Detection

机译:用于声音事件检测的多尺度和单尺度完全卷积网络

获取原文
获取原文并翻译 | 示例

摘要

Among various Sound Event Detection (SED) systems, Recurrent Neural Networks (RNN), such as long short-term memory unit and gated recurrent unit, is used to capture temporal dependencies, but it is confined in its length of temporal dependencies, resulting in a failure to model sound events with long duration. What's more, RNN is incapable to process datasets in parallel, leading to low efficiency and low industrial value. Given these shortcomings, we propose to use dilated convolution (and causal dilated convolution) to capture temporal dependencies, as its great ability to ensure high time resolution and obtain longer temporal dependencies under the filter size and the network depth unchanged. In addition, dilated convolution can be parallelized, so it has higher efficiency and industrial value. Based on this, we propose Single-Scale Fully Convolutional Networks (SS-FCN) composed of convolutional neural networks and dilated convolutional networks, with the former to provide frequency invariance and the later to capture temporal dependencies. With the help of dilated convolution to control the length of temporal dependencies, we observe SS-FCN modeling a single length of temporal dependencies achieves superior detection performance for finite kinds of events. For better performance, we propose Multi-Scale Fully Convolutional Networks (MS-FCN), in which the feature fusion module is introduced to capture long short-term dependencies by fusing features with different length of temporal dependencies. The proposed methods achieve competitive performance on three main datasets with higher efficiency. The results show that SED systems based on Fully Convolutional Networks have further research value and potential. (c) 2020 Elsevier B.V. All rights reserved.
机译:在各种声音事件检测(SED)系统中,用于捕获时间依赖性的经常性神经网络(RNN),例如长短期存储器单元和门控复发单元,但它被限制在其时间依赖性的长度中,从而导致失败,持续时间很长。更重要的是,RNN无法平行处理数据集,导致效率低,工业价值低。鉴于这些缺点,我们建议使用扩张的卷积(和因果扩张卷积)来捕获时间依赖性,因为它能够确保高时间分辨率的能力,并在过滤器尺寸下获得更长的时间依赖性,并且网络深度不变。另外,扩张的卷积可以是平行化的,因此它具有更高的效率和工业价值。基于此,我们提出了由卷积神经网络和扩张卷积网络组成的单尺度完全卷积网络(SS-FCN),前者提供频率不变性,以后捕获时间依赖性。借助扩张的卷积来控制时间依赖性的长度,我们观察SS-FCN建模,单个时间依赖性实现了有限种类事件的卓越的检测性能。为了更好的性能,我们提出了多尺寸的完全卷积网络(MS-FCN),其中引入了特征融合模块,通过融合具有不同时间依赖性长度的融合功能来捕获长期短期依赖性。该方法在具有更高效率的三个主要数据集中实现了竞争性能。结果表明,基于完全卷积网络的SED系统具有进一步的研究价值和潜力。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号