首页> 外文期刊>International Journal of Multimedia Information Retrieval >Beyond audio and video retrieval: topic-oriented multimedia summarization
【24h】

Beyond audio and video retrieval: topic-oriented multimedia summarization

机译:音频和视频检索之外:面向主题的多媒体摘要

获取原文
获取原文并翻译 | 示例
       

摘要

Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information inways that go beyond browsing or collaborative filtering. In this paper, we review previous work on audio and video processing, and define the task of topicoriented multimedia summarization (TOMS) using natural language generation (NLG): given a set of automatically extracted features from a video, a TOMS system will automatically generate a paragraph of natural language, which summarizes the important information in a video belonging to a certain topic, and for example provides explanations for why a video was matched and retrieved. Possible features include visual semantic concepts, objects, and actions, environmental sounds, and transcripts from automatic speech recognition (ASR). We see this as a first step towards systems that will be able to discriminate visually similar, but semantically different videos, compare two videos and provide textual output or summarize a large number of videos at once. In this paper, we introduce our approach of solving the TOMS problem. We extract various visual concept features, environmental sounds and ASR transcription features from a given video, and develop a template-based NLG system to produce a textual recounting based on the extracted features. We also propose possible experimental designs for continuously evaluating and improving TOMS systems, and present results of a pilot evaluation of our initial system.
机译:考虑到Internet上越来越多的多媒体内容,能够有效地检查和组织这些庞大的内部信息存储(已超出浏览或协作过滤的范围)变得越来越重要。在本文中,我们回顾了以前在音频和视频处理方面的工作,并使用自然语言生成(NLG)定义了面向主题的多媒体摘要(TOMS)的任务:给定一组从视频中自动提取的功能,TOMS系统将自动生成自然语言的一段,总结了属于某个主题的视频中的重要信息,例如提供了为什么匹配和检索视频的解释。可能的功能包括视觉语义概念,对象和动作,环境声音以及自动语音识别(ASR)的成绩单。我们认为这是朝着能够区分视觉相似但语义不同的视频,比较两个视频并提供文本输出或一次汇总大量视频的系统迈出的第一步。在本文中,我们介绍了解决TOMS问题的方法。我们从给定的视频中提取各种视觉概念特征,环境声音和ASR转录特征,并开发基于模板的NLG系统,以基于提取的特征产生文本重新计数。我们还提出了可能的实验设计,以不断评估和改进TOMS系统,并提出初步系统评估的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号