Beyond audio and video retrieval: topic-oriented multimedia summarization

Florian Metze; Duo Ding; Ehsan Younessian; Alexander Hauptmann

首页> 外文期刊>International Journal of Multimedia Information Retrieval >Beyond audio and video retrieval: topic-oriented multimedia summarization

【24h】

Beyond audio and video retrieval: topic-oriented multimedia summarization

机译：音频和视频检索之外：面向主题的多媒体摘要

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information inways that go beyond browsing or collaborative filtering. In this paper, we review previous work on audio and video processing, and define the task of topicoriented multimedia summarization (TOMS) using natural language generation (NLG): given a set of automatically extracted features from a video, a TOMS system will automatically generate a paragraph of natural language, which summarizes the important information in a video belonging to a certain topic, and for example provides explanations for why a video was matched and retrieved. Possible features include visual semantic concepts, objects, and actions, environmental sounds, and transcripts from automatic speech recognition (ASR). We see this as a first step towards systems that will be able to discriminate visually similar, but semantically different videos, compare two videos and provide textual output or summarize a large number of videos at once. In this paper, we introduce our approach of solving the TOMS problem. We extract various visual concept features, environmental sounds and ASR transcription features from a given video, and develop a template-based NLG system to produce a textual recounting based on the extracted features. We also propose possible experimental designs for continuously evaluating and improving TOMS systems, and present results of a pilot evaluation of our initial system.

机译：考虑到Internet上越来越多的多媒体内容，能够有效地检查和组织这些庞大的内部信息存储（已超出浏览或协作过滤的范围）变得越来越重要。在本文中，我们回顾了以前在音频和视频处理方面的工作，并使用自然语言生成（NLG）定义了面向主题的多媒体摘要（TOMS）的任务：给定一组从视频中自动提取的功能，TOMS系统将自动生成自然语言的一段，总结了属于某个主题的视频中的重要信息，例如提供了为什么匹配和检索视频的解释。可能的功能包括视觉语义概念，对象和动作，环境声音以及自动语音识别（ASR）的成绩单。我们认为这是朝着能够区分视觉相似但语义不同的视频，比较两个视频并提供文本输出或一次汇总大量视频的系统迈出的第一步。在本文中，我们介绍了解决TOMS问题的方法。我们从给定的视频中提取各种视觉概念特征，环境声音和ASR转录特征，并开发基于模板的NLG系统，以基于提取的特征产生文本重新计数。我们还提出了可能的实验设计，以不断评估和改进TOMS系统，并提出初步系统评估的结果。

著录项

来源
《International Journal of Multimedia Information Retrieval》 |2013年第2期|共14页
作者
Florian Metze; Duo Ding; Ehsan Younessian; Alexander Hauptmann;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆学、图书馆事业;
关键词
Multimedia summarization; Event detection and recounting; Natural language generation;

机译：多媒体汇总;事件检测和计数;自然语言生成;
入库时间 2022-08-18 10:38:51

相似文献

外文文献
中文文献
专利

1. Beyond audio and video retrieval: topic-oriented multimedia summarization [J] . Florian Metze, Duo Ding, Ehsan Younessian, International Journal of Multimedia Information Retrieval . 2013,第2期

机译：音频和视频检索之外：面向主题的多媒体摘要
2. Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous Text, Image, Audio and Video [J] . Li Haoran, Zhu Junnan, Ma Cong, IEEE Transactions on Knowledge and Data Engineering . 2019,第5期

机译：阅读，观看，倾听和总结：异步文本，图像，音频和视频的多模态摘要
3. Content-Aware Summarization of Broadcast Sports Videos: An Audio-Visual Feature Extraction Approach [J] . Abdullah Aman Khan, Jie Shao, Waqar Ali, Neural processing letters . 2020,第3期

机译：广播运动视频的内容感知摘要：视听特征提取方法
4. Rate-coverage analysis and optimization for joint audio-video multimedia retrieval [C] . Guanghan Ning, Zhi Zhang, Xiaobo Ren, IEEE International Conference on Acoustics, Speech and Signal Processing . 2017

机译：联合音视频多媒体检索的速率覆盖率分析和优化
5. Rate-distortion optimal video summarization and video retrieval based on principal component space geometry matching. [D] . Li, Zhu. 2004

机译：基于主成分空间几何匹配的失真率最优视频摘要和视频检索。
6. Automatic Summarization of MEDLINE Citations for Evidence–Based Medical Treatment: A Topic-Oriented Evaluation [O] . Marcelo Fiszman, Dina Demner-Fushman, Halil Kilicoglu, -1

机译：循证医学治疗对MEDLINE引文的自动总结：面向主题的评估
7. A fuzzy video content representation for video summarization and content-based retrieval [O] . Doulamis AD, Doulamis ND, Kollias SD 2000

机译：用于视频摘要和基于内容的检索的模糊视频内容表示

Beyond audio and video retrieval: topic-oriented multimedia summarization

摘要

著录项

相似文献

相关主题

期刊订阅