Global Affective Video Content Regression Based on Complementary Audio-Visual Features

机译：基于互补视听特征的全局情感视频内容回归

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a new framework for global affective video content regression with five complementary audio-visual features. For the audio modality, we select the global audio feature eGeMAPS and two deep features SoundNet and VGGish. As for the visual modality, the key frames of original images and those of optical flow images are both used to extract VGG-19 features with finetuned models, in order to represent the original visual cues in conjunction with motion information. In the experiments, we perform the evaluations of selected audio and visual features on the dataset of Emotional Impact of Movies Task 2016 (EIMT16), and compare our results with those of competitive teams in EIMT16 and state-of-the-art method. The experimental results show that the fusion of five features can achieve better regression results in both arousal and valence dimensions, indicating the selected five features are complementary with each other in the audio-visual modalities. Furthermore, the proposed approach can achieve better regression results than the state-of-the-art method in both evaluation metrics of MSE and PCC in the arousal dimension and comparable MSE results in the valence dimension. Although our approach obtains slightly lower PCC result than the state-of-the-art method in the valence dimension, the fused feature vectors used in our framework have much lower dimensions with a total of 1752, only five thousandths of feature dimensions in the state-of-the-art method, largely bringing down the memory requirements and computational burden.

机译：在本文中，我们提出了具有五个互补视听功能的全球情感视频内容回归的新框架。对于音频模态，我们选择全局音频功能eGeMAPS和两个深层功能SoundNet和VGGish。至于视觉形态，原始图像的关键帧和光流图像的关键帧都用于通过微调模型提取VGG-19特征，以便结合运动信息来表示原始视觉线索。在实验中，我们对“电影情感影响任务2016（EIMT16）”的数据集进行了选定的音频和视觉功能的评估，并将我们的结果与EIMT16和最先进方法的竞争团队的结果进行了比较。实验结果表明，五个特征的融合可以在唤醒和化合价维度上获得更好的回归结果，表明所选的五个特征在视听模态中是互补的。此外，在唤醒维度上的MSE和PCC的评估指标以及化合价方面的MSE结果方面，与最新方法相比，所提出的方法可以获得更好的回归结果。尽管我们的方法在化合价维上获得的PCC结果略低于最新方法，但是我们框架中使用的融合特征向量的维数却低得多，总计为1752，状态中只有五分之一的特征维最先进的方法，大大降低了内存需求和计算负担。

著录项

来源
《International Conference on Multimedia Modeling》|2020年|540-550|共11页
会议地点
作者
Xiaona Guo; Wei Zhong; Long Ye; Li Fang; Yan Heng; Qin Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Affective video content regression; eGeMAPS; SoundNet; VGG; Optical flow;

机译：情感视频内容回归; eGeMAPS; SoundNet; VGG;光流;
入库时间 2022-08-26 13:55:05

相似文献

外文文献
中文文献
专利

1. Content-Aware Summarization of Broadcast Sports Videos: An Audio-Visual Feature Extraction Approach [J] . Abdullah Aman Khan, Jie Shao, Waqar Ali, Neural processing letters . 2020,第3期

机译：广播运动视频的内容感知摘要：视听特征提取方法
2. Multimodal framework based on audio-visual features for summarisation of cricket videos [J] . Javed Ali, Irtaza Aun, Malik Hafiz, Image Processing, IET . 2019,第4期

机译：基于视听功能的多模式框架，用于板球视频摘要
3. Hybrid feature-based analysis of video's affective content using protagonist detection [J] . Zhu Yingying, Tong Min, Jiang Zhengbo, Expert Systems with Application . 2019,第AUGa期

机译：基于主角检测的基于混合特征的视频情感内容分析
4. Global Affective Video Content Regression Based on Complementary Audio-Visual Features [C] . Xiaona Guo, Wei Zhong, Long Ye, International Conference on Multimedia Modeling . 2020

机译：全局情感视频内容回归基于互补视听功能
5. Effective temporal video segmentation and content-based audio-visual video clustering. [D] . Kang, Jung Won. 2003

机译：有效的时间视频分割和基于内容的视听视频聚类。
6. Audio-Visual Causality and Stimulus Reliability Affect Audio-Visual Synchrony Perception [O] . Shao Li, Qi Ding, Yichen Yuan, 2021

机译：视听因果关系和刺激可靠性会影响视听同步的感知
7. A framework for event detection in field-sports video broadcasts based on SVM generated audio-visual feature model. Case-study: soccer video [O] . Sadlier David A., OConnor Noel E., Murphy Noel, 2004

机译：基于sVm生成的视听特征模型的现场体育视频广播事件检测框架。案例研究：足球视频

Global Affective Video Content Regression Based on Complementary Audio-Visual Features

摘要

著录项

相似文献

相关主题

期刊订阅