首页> 外文会议>Conference of the International Speech Communication Association >A New Language Independent, Photo-realistic Talking Head Driven by Voice Only
【24h】

A New Language Independent, Photo-realistic Talking Head Driven by Voice Only

机译:一种独立的新语言,仅由语音驱动的照片逼真的谈话头

获取原文

摘要

We propose a new photo-realistic, voice driven only (i.e. no linguistic info of the voice input is needed) talking head. The core of the new talking head is a context-dependent, multilayer, Deep Neural Network (DNN), which is discriminatively trained over hundreds of hours, speaker independent speech data. The trained DNN is then used to map acoustic speech input to 9,000 tied "senone" states probabilistically. For each photo-realistic talking head, an HMM-based lips motion synthesizer is trained over the speaker's audio/visual training data where states are statistically mapped to the corresponding lips images. In test, for given speech input, DNN predicts the likely states in theirposterior probabilities and photo-realistic lips animation is then rendered through the DNN predicted state lattice. The DNN trained on English, speaker independent data has also been tested with other language input, e.g. Mandarin, Spanish, etc. to mimic the lips movements cross-lingually. Subjective experiments show that lip motions thus rendered for 15 non-English languages are highly synchronized with the audio input and photo-realistic to human eyes perceptually.
机译:我们提出了一种新的照片逼真,声音驱动(即,不需要语音输入的语言信息)谈话。新的谈话头的核心是依赖于上下文,多层的深神经网络(DNN),其差别地训练了数百小时,扬声器独立的语音数据。然后,训练的DNN将用于将声音语音输入到9,000张绑定的“塞诺诺”状态。对于每个照片逼真的谈话头,基于赫姆的嘴唇运动合成器训练在扬声器的音频/可视训练数据上,其中各种统计映射到相应的嘴唇图像。在测试中,对于给定的语音输入,DNN预测其具有概率概率中的可能状态,然后通过DNN预测状态格子呈现照片 - 现实嘴唇动画。 DNN培训英语,扬声器独立数据也用其他语言输入进行了测试,例如,普通话,西班牙语等模仿嘴唇的运动。主观实验表明,为15个非英语提供了如此呈现的唇部运动与感知的人眼的音频输入和照片真实性高度同步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号