首页> 外文期刊>ACM Transactions on Graphics >Video-Audio Driven Real-Time Facial Animation
【24h】

Video-Audio Driven Real-Time Facial Animation

机译:视频音频驱动的实时面部动画

获取原文
获取原文并翻译 | 示例

摘要

We present a real-time facial tracking and animation system basedrnon a Kinect sensor with video and audio input. Our method requiresrnno user-specific training and is robust to occlusions, large head rotations,rnand background noise. Given the color, depth and speechrnaudio frames captured from an actor, our system first reconstructsrn3D facial expressions and 3D mouth shapes from color and depthrninput with a multi-linear model. Concurrently a speaker-independentrnDNN acoustic model is applied to extract phoneme state posteriorrnprobabilities (PSPP) from the audio frames. After that, a lip motionrnregressor refines the 3D mouth shape based on both PSPP andrnexpression weights of the 3D mouth shapes, as well as their confidences.rnFinally, the refined 3D mouth shape is combined with otherrnparts of the 3D face to generate the final result. The whole processrnis fully automatic and executed in real time.rnThe key component of our system is a data-driven regresor forrnmodeling the correlation between speech data and mouth shapes.rnBased on a precaptured database of accurate 3D mouth shapes andrnassociated speech audio from one speaker, the regressor jointly usesrnthe input speech and visual features to refine the mouth shape of arnnew actor. We also present an improved DNN acoustic model. It notrnonly preserves accuracy but also achieves real-time performance.rnOur method efficiently fuses visual and acoustic information for 3Drnfacial performance capture. It generates more accurate 3D mouthrnmotions than other approaches that are based on audio or videorninput only. It also supports video or audio only input for real-time facial animation. We evaluate the performance of our system withrnspeech and facial expressions captured from different actors. Resultsrndemonstrate the efficiency and robustness of our method.
机译:我们提出了一种基于Kinect传感器的实时面部跟踪和动画系统,该传感器具有视频和音频输入。我们的方法不需要特定于用户的培训,并且对遮挡,较大的头部旋转,周围和背景噪音具有鲁棒性。给定从演员捕获的颜色,深度和语音音频帧,我们的系统首先使用多线性模型从颜色和深度输入重建3D面部表情和3D嘴形。同时,与说话者无关的DNN声学模型被应用于从音频帧中提取音素状态后验概率(PSPP)。之后,唇部运动回归器会根据PSPP和3D嘴形的权重及其置信度来细化3D嘴形。最后,将精炼的3D嘴形与3D脸部的其他部分结合起来以生成最终结果。整个过程是全自动的并实时执行。我们系统的关键组件是一个数据驱动的调节器,用于建模语音数据和嘴形之间的相关性。rn基于预先捕获的准确3D嘴形和一个扬声器的相关语音音频的数据库,回归者共同使用输入的语音和视觉特征来细化新演员的嘴形。我们还提出了一种改进的DNN声学模型。它不仅可以保持准确性,还可以实现实时性能。我们的方法可以有效融合视觉和听觉信息,以进行3D面部性能捕获。与仅基于音频或视频输入的其他方法相比,它可以生成更准确的3D口部运动。它还支持仅视频或音频输入的实时面部动画。我们使用从不同演员捕获的语音和面部表情来评估系统的性能。结果证明了我们方法的有效性和鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号