Global Affective Video Content Regression Based on Complementary Audio-Visual Features

机译：全局情感视频内容回归基于互补视听功能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a new framework for global affective video content regression with five complementary audio-visual features. For the audio modality, we select the global audio feature eGeMAPS and two deep features SoundNet and VGGish. As for the visual modality, the key frames of original images and those of optical flow images are both used to extract VGG-19 features with finetuned models, in order to represent the original visual cues in conjunction with motion information. In the experiments, we perform the evaluations of selected audio and visual features on the dataset of Emotional Impact of Movies Task 2016 (EIMT16), and compare our results with those of competitive teams in EIMT16 and state-of-the-art method. The experimental results show that the fusion of five features can achieve better regression results in both arousal and valence dimensions, indicating the selected five features are complementary with each other in the audio-visual modalities. Furthermore, the proposed approach can achieve better regression results than the state-of-the-art method in both evaluation metrics of MSE and PCC in the arousal dimension and comparable MSE results in the valence dimension. Although our approach obtains slightly lower PCC result than the state-of-the-art method in the valence dimension, the fused feature vectors used in our framework have much lower dimensions with a total of 1752, only five thousandths of feature dimensions in the state-of-the-art method, largely bringing down the memory requirements and computational burden.

机译：在本文中，我们向全球情感视频内容回归提出了一种具有五种互补视觉视听功能的新框架。对于音频模态，我们选择全局音频功能EGEMAPS和两个深度特色SoundNet和VAGA出。对于视觉模态，原始图像的关键帧和光学流量图像的关键帧均用于利用FFETUNED模型提取VGG-19特征，以便结合运动信息来表示原始视觉线索。在实验中，我们在电影任务2016（EIMT16）的情绪影响的日数据集上执行所选音频和视觉功能的评估，并将我们的结果与EIMT16中的竞争团队和最先进的方法进行比较。实验结果表明，五个特征的融合可以实现更好的回归导致唤醒和价维的结果，指示所选择的五个特征在视听模式中彼此互补。此外，所提出的方法可以在唤醒尺寸和可比的MSE中获得更好的回归结果，而不是在唤醒尺寸中的MSE和PCC的评估度量和PCC中的结果。虽然我们的方法比价维中的最先进的方法获得略低的PCC结果，但我们框架中使用的融合特征向量具有远低于较低的尺寸，共1752年，状态仅为五千万个特征尺寸-Of-ART方法，在很大程度上降低了内存要求和计算负担。

著录项

来源
《International Conference on Multimedia Modeling》|2020年|xxx 820 p.|共11页
会议地点
作者
Xiaona Guo; Wei Zhong; Long Ye; Li Fang; Yan Heng; Qin Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类多媒体技术与多媒体计算机;
关键词
Affective video content regression; eGeMAPS; SoundNet; VGG; Optical flow;

机译：情感视频内容回归;EGEMAPS;SoundNet;VGG;光流动;

相似文献

外文文献
中文文献
专利

1. Content-Aware Summarization of Broadcast Sports Videos: An Audio-Visual Feature Extraction Approach [J] . Abdullah Aman Khan, Jie Shao, Waqar Ali, Neural processing letters . 2020,第3期

机译：广播运动视频的内容感知摘要：视听特征提取方法
2. Multimodal framework based on audio-visual features for summarisation of cricket videos [J] . Javed Ali, Irtaza Aun, Malik Hafiz, Image Processing, IET . 2019,第4期

机译：基于视听功能的多模式框架，用于板球视频摘要
3. Hybrid feature-based analysis of video's affective content using protagonist detection [J] . Zhu Yingying, Tong Min, Jiang Zhengbo, Expert Systems with Application . 2019,第AUGa期

机译：基于主角检测的基于混合特征的视频情感内容分析
4. Global Affective Video Content Regression Based on Complementary Audio-Visual Features [C] . Xiaona Guo, Wei Zhong, Long Ye, International Conference on Multimedia Modeling . 2020

机译：基于互补视听特征的全局情感视频内容回归
5. Effective temporal video segmentation and content-based audio-visual video clustering. [D] . Kang, Jung Won. 2003

机译：有效的时间视频分割和基于内容的视听视频聚类。
6. Audio-Visual Causality and Stimulus Reliability Affect Audio-Visual Synchrony Perception [O] . Shao Li, Qi Ding, Yichen Yuan, 2021

机译：视听因果关系和刺激可靠性会影响视听同步的感知
7. A framework for event detection in field-sports video broadcasts based on SVM generated audio-visual feature model. Case-study: soccer video [O] . Sadlier David A., OConnor Noel E., Murphy Noel, 2004

机译：基于sVm生成的视听特征模型的现场体育视频广播事件检测框架。案例研究：足球视频

Global Affective Video Content Regression Based on Complementary Audio-Visual Features

摘要

著录项

相似文献

相关主题

期刊订阅