首页> 外文会议>2019 IEEE International Conference on Signals and Systems >Time-Frequency Image Resizing Using Interpolation for Acoustic Event Recognition with Convolutional Neural Networks
【24h】

Time-Frequency Image Resizing Using Interpolation for Acoustic Event Recognition with Convolutional Neural Networks

机译:卷积神经网络的插值时频图像尺寸调整用于声事件识别

获取原文
获取原文并翻译 | 示例

摘要

Convolutional neural networks (CNN) are being increasingly used for audio signal classification applications, including acoustic event recognition. CNN is an image classifier and acoustic event signals are often represented using time-frequency image for this purpose. However, the length or duration of the sound event signals can vary greatly and an important consideration is how to equally size time-frequency images for classification using CNN. In this paper, we use techniques from digital image processing to address this problem. In particular, we apply interpolation-based image resizing techniques to form equally sized time-frequency representations. We consider nearest-neighbor, bilinear, bicubic, and Lanczos kernel interpolation methods for this purpose. A database containing 50 sound event classes with sound events of varying duration is used to evaluate the classification performance of these resized time-frequency images. The results show that the time-frequency images resized using bicubic and Lanczos kernel interpolation methods give a much improved classification performance than the conventional time-frequency image representation.
机译:卷积神经网络(CNN)越来越多地用于音频信号分类应用,包括声音事件识别。 CNN是图像分类器,因此通常使用时频图像表示声音事件信号。但是,声音事件信号的长度或持续时间可能会发生很大变化,并且重要的考虑因素是如何使用CNN来均匀划分时频图像的大小。在本文中,我们使用数字图像处理技术来解决此问题。特别是,我们应用基于插值的图像大小调整技术来形成大小相等的时频表示。为此,我们考虑了最近邻,双线性,双三次和Lanczos核插值方法。包含50个声音事件类别的数据库,其中声音事件的持续时间各不相同,用于评估这些调整大小的时频图像的分类性能。结果表明,使用双三次和Lanczos核插值方法调整大小的时频图像与常规时频图像表示相比,分类性能大大提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号