首页> 外文会议>International Conference on Multimedia Modeling >Global Affective Video Content Regression Based on Complementary Audio-Visual Features
【24h】

Global Affective Video Content Regression Based on Complementary Audio-Visual Features

机译:全局情感视频内容回归基于互补视听功能

获取原文

摘要

In this paper, we propose a new framework for global affective video content regression with five complementary audio-visual features. For the audio modality, we select the global audio feature eGeMAPS and two deep features SoundNet and VGGish. As for the visual modality, the key frames of original images and those of optical flow images are both used to extract VGG-19 features with finetuned models, in order to represent the original visual cues in conjunction with motion information. In the experiments, we perform the evaluations of selected audio and visual features on the dataset of Emotional Impact of Movies Task 2016 (EIMT16), and compare our results with those of competitive teams in EIMT16 and state-of-the-art method. The experimental results show that the fusion of five features can achieve better regression results in both arousal and valence dimensions, indicating the selected five features are complementary with each other in the audio-visual modalities. Furthermore, the proposed approach can achieve better regression results than the state-of-the-art method in both evaluation metrics of MSE and PCC in the arousal dimension and comparable MSE results in the valence dimension. Although our approach obtains slightly lower PCC result than the state-of-the-art method in the valence dimension, the fused feature vectors used in our framework have much lower dimensions with a total of 1752, only five thousandths of feature dimensions in the state-of-the-art method, largely bringing down the memory requirements and computational burden.
机译:在本文中,我们向全球情感视频内容回归提出了一种具有五种互补视觉视听功能的新框架。对于音频模态,我们选择全局音频功能EGEMAPS和两个深度特色SoundNet和VAGA出。对于视觉模态,原始图像的关键帧和光学流量图像的关键帧均用于利用FFETUNED模型提取VGG-19特征,以便结合运动信息来表示原始视觉线索。在实验中,我们在电影任务2016(EIMT16)的情绪影响的日数据集上执行所选音频和视觉功能的评估,并将我们的结果与EIMT16中的竞争团队和最先进的方法进行比较。实验结果表明,五个特征的融合可以实现更好的回归导致唤醒和价维的结果,指示所选择的五个特征在视听模式中彼此互补。此外,所提出的方法可以在唤醒尺寸和可比的MSE中获得更好的回归结果,而不是在唤醒尺寸中的MSE和PCC的评估度量和PCC中的结果。虽然我们的方法比价维中的最先进的方法获得略低的PCC结果,但我们框架中使用的融合特征向量具有远低于较低的尺寸,共1752年,状态仅为五千万个特征尺寸-Of-ART方法,在很大程度上降低了内存要求和计算负担。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号