...
首页> 外文期刊>IEEE transactions on multimedia >Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling
【24h】

Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling

机译:使用发音模型对语音驱动的说话人脸进行逼真的嘴部同步

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This paper presents an articulatory modelling approach to convert acoustic speech into realistic mouth animation. We directly model the movements of articulators, such as lips, tongue, and teeth, using a dynamic Bayesian network (DBN)-based audio-visual articulatory model (AVAM). A multiple-stream structure with a shared articulator layer is adopted in the model to synchronously associate the two building blocks of speech, i.e., audio and video. This model not only describes the synchronization between visual articulatory movements and audio speech, but also reflects the linguistic fact that different articulators evolve asynchronously. We also present a Baum-Welch DBN inversion (DBNI) algorithm to generate optimal facial parameters from audio given the trained AVAM under maximum likelihood (ML) criterion. Extensive objective and subjective evaluations on the JEWEL audio-visual dataset demonstrate that compared with phonemic HMM approaches, facial parameters estimated by our approach follow the true parameters more accurately, and the synthesized facial animation sequences are so lively that 38% of them are undistinguishable
机译:本文提出了一种将声音转换成逼真的嘴部动画的发音建模方法。我们使用基于动态贝叶斯网络(DBN)的视听发音模型(AVAM)直接对发音器(例如嘴唇,舌头和牙齿)的运动进行建模。该模型中采用了具有共享发音器层的多流结构,以同步关联语音的两个基本组成部分,即音频和视频。该模型不仅描述了视觉发音运动和音频语音之间的同步,而且反映了不同发音器异步发展的语言事实。我们还提出了Baum-Welch DBN反演(DBNI)算法,以在最大似然(ML)准则下给定受过训练的AVAM的情况下,从音频生成最佳面部参数。对JEWEL视听数据集的广泛的主观评估表明,与音素HMM方法相比,我们的方法估计的面部参数更准确地遵循了真实参数,并且合成的面部动画序列是如此生动,以至于其中38%的面部动画序列无法区分

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号