首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops >Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing
【24h】

Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing

机译:像人类一样编辑:一个用于自动视频编辑的语境,多模式框架

获取原文

摘要

We propose an automated video editing model, which we term contextual and multimodal video editing (CMVE). The model leverages visual and textual metadata describing videos, integrating essential information from both modalities, and uses a learned editing style from a single example video to coherently combine clips. The editing model is useful for tasks such as generating news clip montages and highlight reels given a text query that describes the video storyline. The model exploits the perceptual similarity between video frames, objects in videos and text descriptions to emulate coherent video editing. Amazon Mechanical Turk participants made judgements comparing CMVE to expert human editing. Experimental results showed no significant difference in the CMVE vs human edited video in terms of matching the text query and the level of interest each generates, suggesting CMVE is able to effectively integrate semantic information across visual and textual modalities and create perceptually coherent quality videos typical of human video editors. We publicly release an online demonstration of our method.
机译:我们提出了一种自动视频编辑模型,我们术语上下文和多模式编辑(CMVE)。该模型利用了视觉和文本元数据描述视频,从两个模态集成基本信息,并使用从单个示例视频中获取的学习编辑样式来连贯地组合剪辑。编辑模型对任务非常有用,例如生成新闻剪辑蒙太奇,并且突出显示卷轴给定描述视频故事情节的文本查询。该模型利用视频帧之间的感知相似性,视频中的对象和文本描述以模拟相干视频编辑。亚马逊机械土耳其参与者使CMVE与专家编辑进行比较。实验结果表明,在匹配文本查询和每个生成的利益水平方面,CMVE VS人类编辑视频没有显着差异,暗示CMVE能够通过视觉和文本方式有效地集成语义信息,并创造典型的感知相干的质量视频人类视频编辑器。我们公开发布我们的方法的在线演示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号