首页> 外文期刊>Expert systems with applications >Learning audio sequence representations for acoustic event classification
【24h】

Learning audio sequence representations for acoustic event classification

机译:学习声学事件分类的音频序列表示

获取原文
获取原文并翻译 | 示例

摘要

Acoustic Event Classification (AEC) has become a significant task for machines to perceive the surrounding auditory scene. However, extracting effective representations that capture the underlying characteristics of the acoustic events is still challenging. Previous methods mainly focused on designing the audio features in a 'handcrafted' manner. Interestingly, data-learnt features have been recently reported to show better performance. Up to now, these were only considered on the frame-level. In this article, we propose an unsupervised learning framework to learn a vector representation of an audio sequence for AEC. This framework consists of a Recurrent Neural Network (RNN) encoder and a RNN decoder, which respectively transforms the variable-length audio sequence into a fixed-length vector and reconstructs the input sequence on the generated vector. After training the encoder-decoder, we feed the audio sequences to the encoder and then take the learnt vectors as the audio sequence representations. Compared with previous methods, the proposed method can not only deal with the problem of arbitrary-lengths of audio streams, but also learn the salient information of the sequence. Extensive evaluation on a large-size acoustic event database is performed, and the empirical results demonstrate that the learnt audio sequence representation yields a significant performance improvement by a large margin compared with other state-of-the-art hand-crafted sequence features for AEC.
机译:声学事件分类(AEC)已成为察觉周围听觉场景的机器的重要任务。但是,提取捕获声学事件的潜在特征的有效表示仍然具有挑战性。以前的方法主要集中在“手工制作”方式设计音频功能。有趣的是,最近据报道了数据学习的功能以表现出更好的性能。到目前为止,这些仅在帧级别考虑。在本文中,我们提出了一个无监督的学习框架来学习AEC的音频序列的矢量表示。该框架包括经常性的神经网络(RNN)编码器和RNN解码器,其分别将可变长度音频序列转换为固定长度向量,并在所生成的向量上重建输入序列。在培训编码器解码器之后,我们将音频序列馈送到编码器,然后将学习的向量作为音频序列表示。与以前的方法相比,所提出的方法不仅可以处理任意长度的音频流的问题,还可以了解序列的突出信息。执行对大型声学事件数据库的广泛评估,并且经验结果表明,学习的音频序列表示通过对AEC的其他最先进的手工制作的序列特征相比,通过大量的边距产生显着的性能提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号