Decomposing style, content, and motion for videos

Hu Yaosi; Yin Dacheng; Wang YuwangChen ZhenzhongLuo Chong

摘要

In this paper, we present the first video decomposition framework, named SyCoMo, that factorizes a video into style, content, and motion. Such a fine-grained decomposition enables flexible video editing, and for the first time allows for tripartite video synthesis. SyCoMo is a unified and domain-agnostic learning framework which can process videos of various object categories without domain-specific design or supervision. Different from other motion decomposition work, SyCoMo derives motion from style-free content by isolating style from content in the first place. Content is organized into subchannels, each of which corresponds to an atomic motion. This design naturally forms an information bottleneck which facilitates a clean decomposition. Experiments show that SyCoMo decomposes videos of various categories into interpretable content subchannels and meaningful motion patterns. Ablation studies also show that deriving motion from style-free content makes the keypoints or landmarks of the object more accurate. We demonstrate the photorealistic quality of the novel tripartite video synthesis in addition to three bipartite synthesis tasks named as style, content, and motion transfer.

机译：在本文中，我们提出了第一个视频分解框架，名为SyCoMo，它将视频分解为风格、内容和动作。这种细粒度的分解可以实现灵活的视频编辑，并首次允许三方视频合成。SyCoMo 是一个统一且与领域无关的学习框架，它可以处理各种对象类别的视频，而无需特定领域的设计或监督。与其他动作分解工作不同，SyCoMo 首先通过将样式与内容隔离，从无样式内容中派生运动。内容被组织成子通道，每个子通道对应一个原子运动。这种设计自然会形成一个信息瓶颈，便于干净的分解。实验表明，SyCoMo将各种类别的视频分解为可解释的内容子通道和有意义的运动模式。消融研究还表明，从无样式的内容中推导出运动可以使对象的关键点或地标更加准确。除了名为风格、内容和运动传输的三个二分合成任务外，我们还展示了新颖的三方视频合成的逼真质量。

Decomposing style, content, and motion for videos

摘要

著录项

引文网络

相关主题

期刊订阅