首页> 外文期刊>IEEE transactions on multimedia >Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention
【24h】

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

机译:基于听觉,视觉和文本注意的电影摘要多模态显着性和融合

获取原文
获取原文并翻译 | 示例
           

摘要

Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation. Textual or linguistic saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying prevailing sensory events. The multimodal saliency representation forms the basis of a generic, bottom-up video summarization algorithm. Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities. The produced summaries, based on low-level features and content-independent fusion and selection, are of subjectively high aesthetic and informative quality.
机译:人类使用信号级别的特征提取和更高级别的认知过程来自然地解析和集成感官信息的多模式流。在这项工作中,基于视频流中传达的音频,视觉和文本信息的显着性模型,制定了引起注意的视听片段的检测方法。听觉或听觉显着性通过量化多频波形调制的线索进行评估,这些线索是通过非线性算子和能量跟踪提取的。视觉显着性是通过由强度,颜色和方向驱动的时空注意力模型来衡量的。文本或语言显着性是从大多数电影发行版中可用的字幕信息上的词性标记中提取的。从依赖于模态的线索中获得的各个显着性流被集成到多模态显着性曲线中,对复合视频流随时间变化的感知重要性进行建模并表示主要的感官事件。多峰显着性表示构成了通用的自底向上视频摘要算法的基础。在多模式显着性注释的电影数据库上评估了不同的融合方案,并提供了跨模态的比较结果。基于低级功能和独立于内容的融合和选择而生成的摘要具有很高的主观美学和信息质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号