A multi-modal video analysis system

机译：多模式视频分析系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present a system for Chinese news program management based on cross media video analysis. Audio, caption text and video frames are all important for a person to understand the meaning of the video. Given these facts, we devised a system integrating continuous Chinese speech recognition (ASR), video caption text recognition (VOCR) and object/scene recognition (OR). The news program is firstly segmented to a serial of segments by anchor person detection. Then the ASR and VOCR recognition results are treated as two paragraphs of text, and we translate them to two bags of words to represent the original recognition results. By analysing the correspondance of the words in ASR result and VOCR result, we can get a trusted set of words to depict the video content of a segment of news program. In the last step, we implement the object/scene classification based on the keyframes analysis aided by the above recognition words. Experiments show that our news management system is efficient.

机译：在本文中，我们提出了一种基于跨媒体视频分析的中文新闻节目管理系统。音频，字幕文本和视频帧对于一个人理解视频的含义都很重要。鉴于这些事实，我们设计了一个集成了连续中文语音识别（ASR），视频字幕文本识别（VOCR）和对象/场景识别（OR）的系统。首先通过主持人检测将新闻节目分割成一系列片段。然后，将ASR和VOCR识别结果视为文本的两个段落，然后将它们翻译成两袋单词来表示原始识别结果。通过分析ASR结果和VOCR结果中单词的对应关系，我们可以获得一组值得信赖的单词来描述一段新闻节目的视频内容。在最后一步中，我们基于上述识别词的关键帧分析实现对象/场景分类。实验表明，我们的新闻管理系统是有效的。

著录项

来源
《2011 IEEE 3rd International Conference on Communication Software and Networks》|2011年|p.176-179|共4页
会议地点
作者
Zhang Shilin; Li Heping; Zhang Shuwu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类通信;
关键词
ASR; BOW; HOG; SIFT; SVM; VOCR;

机译：ASR;弓;猪; SIFT; SVM; VOCR;

相似文献

外文文献
中文文献
专利

1. Attention Based Multi-Modal Fusion Architecture for Open-Ended Video Question Answering Systems [J] . Sumedh Pendurkar, Sameer Kolpekwar, Shreyas Dhoot, Procedia Computer Science . 2020,第5期

机译：基于关注的开放式视频问题应答系统的多模态融合架构
2. Navigating Multi-Modal Public Transport Systems: Real Time Perceptions of Processual Usability Using Video Methodology [J] . Per Echeverri Procedia - Social and Behavioral Sciences . 2012,第2期

机译：导航多式联运系统：使用视频方法对过程可用性的实时感知
3. A multi-modal video analysis approach for car park fire detection [J] . Steven Verstockt, Sofie Van Hoecke, Tarek Beji, Fire Safety Journal . 2013,第apra期

机译：用于停车场火灾检测的多模式视频分析方法
4. A Multi-modal Video Analysis System [C] . Shilin Zhang, Heping Li, Shuwu Zhang International conference on computer and network engineering . 2011

机译：多模式视频分析系统
5. Using Classification for Analysis of Multi-Modal Video Summarization [D] . ?Wells, Brendan 2020

机译：采用分级的多模态视频摘要分析
6. Systematic analysis of video-based pulse measurement from compressed videos [O] . Ewa M. Nowara, Daniel McDuff, Ashok Veeraraghavan 2021

机译：从压缩视频系统分析基于视频的脉冲测量
7. Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos [O] . Acar, Esra, Hopfgartner, Frank, Albayrak, Sahin 2015

机译：融合多模态表征和密集轨迹的视频情感分析

A multi-modal video analysis system

摘要

著录项

相似文献

相关主题

期刊订阅