首页> 外文会议>International Conference on Pattern Recognition >Let's Play Music: Audio-Driven Performance Video Generation
【24h】

Let's Play Music: Audio-Driven Performance Video Generation

机译:让我们玩音乐:音频驱动的性能视频生成

获取原文

摘要

We propose a new task named Audio-driven Performance Video Generation (APVG), which aims to synthesize the video of a person playing a certain instrument guided by a given music audio clip. It is a challenging task to generate the high-dimensional temporal consistent videos from low-dimensional audio modality. In this paper, we propose a multi-staged framework to generate realistic and synchronized performance video from given music. Firstly, we provide both global appearance and local spatial information by generating the coarse videos and keypoints of body and hands from a given music respectively. Then, we propose to transform the generated keypoints to heatmap via a differentiable space transformer, since the heatmap provides more spatial information but is harder to generate directly from audio. Finally, we propose a Structured Temporal UNet (STU) to extract both intra-frame structured information and interframe temporal consistency. They are obtained via graph-based structure module, and CNN-GRU based high-level temporal module respectively for final video generation. Comprehensive experiments validate the effectiveness of our proposed framework.
机译:我们提出了一项名为Audio-Driven Performance Video Generation(APVG)的新任务,该任务旨在综合播放由给定音乐音频剪辑的某个仪器的人的视频。从低维音频模型生成高维时间一致视频是一个具有挑战性的任务。在本文中,我们提出了一种多阶段的框架来产生来自给定音乐的现实和同步性能视频。首先,我们通过分别生成身体和手中的粗视频和关键点,提供全局外观和局部空间信息。然后,我们建议通过可分离的空间变压器将所生成的关键点转换为热示例,因为Heatmap提供了更多的空间信息,但更难直接从音频生成。最后,我们提出了一种结构化的颞率UNET(STU)来提取帧内结构化信息和帧间时间一致性。它们通过基于图形的结构模块和基于CNN-GRU的高级时间模块获得了最终视频。综合实验验证了我们提出的框架的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号