首页> 外文会议>International Conference on Multimedia Modeling >Global Affective Video Content Regression Based on Complementary Audio-Visual Features
【24h】

Global Affective Video Content Regression Based on Complementary Audio-Visual Features

机译:基于互补视听特征的全局情感视频内容回归

获取原文

摘要

In this paper, we propose a new framework for global affective video content regression with five complementary audio-visual features. For the audio modality, we select the global audio feature eGeMAPS and two deep features SoundNet and VGGish. As for the visual modality, the key frames of original images and those of optical flow images are both used to extract VGG-19 features with finetuned models, in order to represent the original visual cues in conjunction with motion information. In the experiments, we perform the evaluations of selected audio and visual features on the dataset of Emotional Impact of Movies Task 2016 (EIMT16), and compare our results with those of competitive teams in EIMT16 and state-of-the-art method. The experimental results show that the fusion of five features can achieve better regression results in both arousal and valence dimensions, indicating the selected five features are complementary with each other in the audio-visual modalities. Furthermore, the proposed approach can achieve better regression results than the state-of-the-art method in both evaluation metrics of MSE and PCC in the arousal dimension and comparable MSE results in the valence dimension. Although our approach obtains slightly lower PCC result than the state-of-the-art method in the valence dimension, the fused feature vectors used in our framework have much lower dimensions with a total of 1752, only five thousandths of feature dimensions in the state-of-the-art method, largely bringing down the memory requirements and computational burden.
机译:在本文中,我们提出了具有五个互补视听功能的全球情感视频内容回归的新框架。对于音频模态,我们选择全局音频功能eGeMAPS和两个深层功能SoundNet和VGGish。至于视觉形态,原始图像的关键帧和光流图像的关键帧都用于通过微调模型提取VGG-19特征,以便结合运动信息来表示原始视觉线索。在实验中,我们对“电影情感影响任务2016(EIMT16)”的数据集进行了选定的音频和视觉功能的评估,并将我们的结果与EIMT16和最先进方法的竞争团队的结果进行了比较。实验结果表明,五个特征的融合可以在唤醒和化合价维度上获得更好的回归结果,表明所选的五个特征在视听模态中是互补的。此外,在唤醒维度上的MSE和PCC的评估指标以及化合价方面的MSE结果方面,与最新方法相比,所提出的方法可以获得更好的回归结果。尽管我们的方法在化合价维上获得的PCC结果略低于最新方法,但是我们框架中使用的融合特征向量的维数却低得多,总计为1752,状态中只有五分之一的特征维最先进的方法,大大降低了内存需求和计算负担。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号