首页> 外文会议>Conference on Photonics Applications in Astronomy, Communications, Industry, and High Energy Physics Experiments >Multimodal Emotion Classification by Streaming Fixed Time Segments for Speaker Movies
【24h】

Multimodal Emotion Classification by Streaming Fixed Time Segments for Speaker Movies

机译:通过流式传输扬声器电影的固定时间段来分类多模式情绪分类

获取原文

摘要

The approach to Video-Audio Emotion Recognition takes advantage of gaining additional information from multimodalites. Since the target features are time related without strict alignment in time, video-audio features become simply video features and audio features. Exploring toward such a goal, spectrogram as outstanding vocal feature in neural network solution is selected to get benefits of convolution filters. Inspired by solution of image captioning of LSTM where embedded words information and image information arc spatially aligned, we perform embedding of the audio spectrogram and image sequences since time information is converted to spatial information in spectrogram. We propose both architecture and framework optimizing the alignment of the mentioned temporal features and we provide the analysis of the significant performance improvement along with the discussion of the Video-Audio Emotion Recognition general tasks.
机译:视频音频情感识别的方法利用了来自多模锰石的其他信息。由于目标特征是随时间严格对齐的时间,因此视频 - 音频功能变得只是视频功能和音频功能。选择探讨这种目标,选择频谱图作为神经网络解决方案中的优秀声音特征,以获得卷积过滤器的好处。灵感来自LSTM的图像标题的解决方案,其中嵌入词信息和图像信息弧空间对齐,我们执行音频频谱图和图像序列的嵌入,因为时间信息被转换为频谱图中的空间信息。我们提出了两种架构和框架,优化了所提到的时间特征的对齐,我们提供了对视频音频情绪识别常规任务的讨论的显着性能改进的分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号