首页> 外文会议>Conference on sound and music technology >A Comparison of Attention Mechanisms of Convolutional Neural Network in Weakly Labeled Audio Tagging
【24h】

A Comparison of Attention Mechanisms of Convolutional Neural Network in Weakly Labeled Audio Tagging

机译:卷积神经网络弱标记音频标签注意机制的比较

获取原文

摘要

Audio tagging aims to predict the types of sound events occurring in audio clips. Recently, the convolutional recurrent neural network (CRNN) has achieved state-of-the-art performance in audio tagging. In CRNN, convolutional layers are applied on input audio features to extract high-level representations followed by recurrent layers. To better learn high-level representations of acoustic features, attention mechanisms were introduced to the convolutional layers of CRNN. Attention is a learning technique that could steer the model to information important to the task to obtain better performance. The two different attention mechanisms in the CRNN, the Squeeze-and-Excitation (SE) block and gated linear unit (GLU), are based on a gating mechanism, but their concerns are different. To compare the performance of the SE block and GLU, we propose to use a CRNN with a SE block (SE-CRNN) and a CRNN with a GLU (GLU-CRNN) in weakly labeled audio tagging and compare these results with the CRNN baseline. The experiments show that the GLU-CRNN achieves an area under curve score of 0.877 in polyphonic audio tagging, outperforming the SE-CRNN of 0.865 and the CRNN baseline of 0.838. The results show that the performance of attention based on GLU is better than the performance of attention based on the SE block in CRNN for weakly labeled polyphonic audio tagging.
机译:音频标记旨在预测音频剪辑中发生的声音事件的类型。最近,卷积递归神经网络(CRNN)在音频标记方面取得了最先进的性能。在CRNN中,将卷积层应用于输入音频特征,以提取高级表示,然后再提取递归层。为了更好地学习声学特征的高级表示,将注意力机制引入了CRNN的卷积层。注意是一种学习技术,可以将模型引导至对任务重要的信息以获得更好的性能。 CRNN中的两种不同的注意机制,即挤压和激发(SE)块和门控线性单元(GLU),是基于门控机制的,但它们的关注点有所不同。为了比较SE块和GLU的性能,我们建议在弱标记的音频标记中使用带有SE块的CRNN(SE-CRNN)和带有GLU的CRNN(GLU-CRNN),并将这些结果与CRNN基线进行比较。实验表明,GLU-CRNN在和弦音频标记中的曲线得分下面积为0.877,胜过SE-CRNN为0.865和CRNN基线为0.838。结果表明,对于弱标记的和弦音频标记,基于GLU的注意性能优于基于CR的SE块的注意性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号