首页> 外文会议>Conference on sound and music technology >A Comparison of Attention Mechanisms of Convolutional Neural Network in Weakly Labeled Audio Tagging
【24h】

A Comparison of Attention Mechanisms of Convolutional Neural Network in Weakly Labeled Audio Tagging

机译:跨标记音频标记中卷积神经网络的注意机制比较

获取原文

摘要

Audio tagging aims to predict the types of sound events occurring in audio clips. Recently, the convolutional recurrent neural network (CRNN) has achieved state-of-the-art performance in audio tagging. In CRNN, convolutional layers are applied on input audio features to extract high-level representations followed by recurrent layers. To better learn high-level representations of acoustic features, attention mechanisms were introduced to the convolutional layers of CRNN. Attention is a learning technique that could steer the model to information important to the task to obtain better performance. The two different attention mechanisms in the CRNN, the Squeeze-and-Excitation (SE) block and gated linear unit (GLU), are based on a gating mechanism, but their concerns are different. To compare the performance of the SE block and GLU, we propose to use a CRNN with a SE block (SE-CRNN) and a CRNN with a GLU (GLU-CRNN) in weakly labeled audio tagging and compare these results with the CRNN baseline. The experiments show that the GLU-CRNN achieves an area under curve score of 0.877 in polyphonic audio tagging, outperforming the SE-CRNN of 0.865 and the CRNN baseline of 0.838. The results show that the performance of attention based on GLU is better than the performance of attention based on the SE block in CRNN for weakly labeled polyphonic audio tagging.
机译:音频标记旨在预测音频剪辑中发生的声音事件的类型。最近,卷积经常性神经网络(CRNN)已经在音频标记中实现了最先进的性能。在CRNN中,卷积层应用于输入音频特征,以提取高级表示,然后是经常性层。为了更好地学习声学特征的高级表示,将注意力引入CRNN的卷积层。注意力是一种学习技术,可以将模型转向到任务的信息,以获得更好的性能。 CRNN中的两个不同关注机构,挤压和激励(SE)块和门控线性单元(GLU)基于门控机构,但它们的担忧是不同的。为了比较SE块和GLU的性能,我们建议在弱标记音频标签使用CRNN与SE块(SE-CRNN),并与GLU(GLU-CRNN)一CRNN和比较这些结果与CRNN基线。该实验表明,Glu-CRNN在多相音频标记中实现了0.877的曲线得分的面积,优于0.865的SE-CRNN和0.838的CRNN基线。结果表明,基于Glu的关注性能优于基于CRNN中的SE块的关注,以实现弱标记的复音音频标记。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号