首页> 外国专利> PERSONALIZED SPEECH-TO-VIDEO WITH THREE-DIMENSIONAL (3D) SKELETON REGULARIZATION AND EXPRESSIVE BODY POSES

PERSONALIZED SPEECH-TO-VIDEO WITH THREE-DIMENSIONAL (3D) SKELETON REGULARIZATION AND EXPRESSIVE BODY POSES

机译:具有三维(3D)骨架正规化和富有表现力的身体的个性化语音 - 视频

摘要

Presented herein are novel embodiments for converting a given speech audio or text into a photo-realistic speaking video of a person with synchronized, realistic, and expressive body dynamics. In one or more embodiments, 3D skeleton movements are generated from the audio sequence using a recurrent neural network, and an output video is synthesized via a conditional generative adversarial network. To make movements realistic and expressive, the knowledge of an articulated 3D human skeleton and a learned dictionary of personal speech iconic gestures may be embedded into the generation process in both learning and testing pipelines. The former prevents the generation of unreasonable body distortion, while the later helps the model quickly learn meaningful body movement with a few videos. To produce photo-realistic and high-resolution video with motion details, a part-attention mechanism is inserted in the conditional GAN, where each detailed part is automatically zoomed in to have their own discriminators.
机译:这里呈现是用于将给定语音音频或文本转换为具有同步,现实和表现力的身体动态的人的光处理讲车视频的新颖的实施例。在一个或多个实施例中,使用经常性神经网络从音频序列生成3D骨架运动,并且通过条件生成的对抗网络合成输出视频。为了使运动变得现实和表现力,可以嵌入学习和测试管道中的发电过程中的铰接式3D人骨骼和学习的个人语音手势词典的知识。前者可以防止生成不合理的身体失真,而后来可以帮助模型用几个视频快速学习有意义的身体运动。要生产带有运动细节的照片逼真和高分辨率视频,部分关注机构插入条件GaN中,其中每个详细零件被自动放大以具有自己的鉴别器。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号