首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >AUDIOVISUAL CLASSIFICATION OF VOCAL OUTBURSTS IN HUMAN CONVERSATION USING LONG-SHORT-TERM MEMORY NETWORKS
【24h】

AUDIOVISUAL CLASSIFICATION OF VOCAL OUTBURSTS IN HUMAN CONVERSATION USING LONG-SHORT-TERM MEMORY NETWORKS

机译:使用长短短期内存网络在人类谈话中的声音爆发的视听分类

获取原文

摘要

We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year's Paralinguistic Challenge's Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.
机译:我们调查与新型视听方法和长短期记忆(LSTM)经常性神经网络的非语言声音的分类,作为高度成功的动态序列分类器。 作为评估数据库,今年的Paralingument挑战是人类自然谈话的视听兴趣罪。 用于基于视频的分析,我们比较基于形状和外观的功能。 这些以早期的方式与典型的音频描述符融合。 结果表明,基于支持向量机的静态方法,LSTM网络的显着改进。 更重要的是,在融合音频和可视形状功能时,我们可以显示出显着的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号