Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach

机译：具有内隐情感意识的语音驱动3D面部动画：一种深度学习方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce a long short-term memory recurrent neural network (LSTM-RNN) approach for real-time facial animation, which automatically estimates head rotation and facial action unit activations of a speaker from just her speech. Specifically, the time-varying contextual non-linear mapping between audio stream and visual facial movements is realized by training a LSTM neural network on a large audio-visual data corpus. In this work, we extract a set of acoustic features from input audio, including Mel-scaled spectrogram, Mel frequency cepstral coefficients and chromagram that can effectively represent both contextual progression and emotional intensity of the speech. Output facial movements are characterized by 3D rotation and blending expression weights of a blendshape model, which can be used directly for animation. Thus, even though our model does not explicitly predict the affective states of the target speaker, her emotional manifestation is recreated via expression weights of the face model. Experiments on an evaluation dataset of different speakers across a wide range of affective states demonstrate promising results of our approach in real-time speech-driven facial animation.

机译：我们为实时面部动画引入了长短期记忆循环神经网络（LSTM-RNN）方法，该方法可以仅根据讲话者的语音自动估计其头部旋转和面部动作单元激活。具体来说，通过在大型视听数据语料库上训练LSTM神经网络，可以实现音频流和视觉面部运动之间随时间变化的上下文非线性映射。在这项工作中，我们从输入音频中提取了一组声学特征，包括梅尔标度频谱图，梅尔频率倒谱系数和色谱图，它们可以有效地代表语境的发展和语音的情感强度。输出面部运动的特征是3D旋转和blendshape模型的混合表达权重，可以直接用于动画。因此，即使我们的模型没有明确预测目标说话者的情感状态，但她的情感表现还是通过面部模型的表情权重来重新创建的。在广泛的情感状态下，对不同说话者的评估数据集进行的实验表明，我们的方法在实时语音驱动的面部动画中具有令人鼓舞的结果。

著录项

来源
《IEEE Conference on Computer Vision and Pattern Recognition Workshops》|2017年|2328-2336|共9页
会议地点
作者
Hai X. Pham; Samuel Cheung; Vladimir Pavlovic;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Three-dimensional displays; Speech; Feature extraction; Face; Solid modeling; Hidden Markov models; Facial animation;

机译：三维显示;语音;特征提取;面部;实体建模;隐马尔可夫模型;面部动画;

相似文献

外文文献
中文文献
专利

1. 一种基于深度稀疏自编码的语音情感迁移学习方法 [J] . 梁镇麟, 梁瑞宇, 唐曼婷, 东南大学学报（英文版） . 2019,第002期
2. Realistic Speech-Driven Facial Animation with GANs [J] . International Journal of Computer Vision . 2020,第5期

机译：与GANS的现实演讲驱动的面部动画
3. SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support [J] . Giampiero Salvi, Jonas Beskow, Samer Al Moubayed, EURASIP journal on audio, speech, and music processing . 2009,第1期

机译：SynFace-语音驱动的面部动画，支持虚拟语音阅读
4. A comparison of acoustic coding models for speech-driven facial animation [J] . Kakumanu P, Esposito A, Garcia ON, Speech Communication . 2006,第6期

机译：语音驱动的面部动画的声学编码模型的比较
5. Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach [C] . Hai X. Pham, Samuel Cheung, Vladimir Pavlovic IEEE Conference on Computer Vision and Pattern Recognition Workshops . 2017

机译：语音驱动的3D面部动画，隐含情绪意识：深度学习方法
6. Expressive speech-driven facial animation. [D] . Cao, Yong. 2005

机译：富有表现力的语音驱动面部动画。
7. Deep Neural Networks for Depression Recognition Based on 2D and 3D Facial Expressions Under Emotional Stimulus Tasks [O] . Weitong Guo, Hongwu Yang, Zhenyu Liu, 2021

机译：基于2D和3D和3D面部表达在情感刺激任务下的深神经网络
8. A low bit-rate web-enabled synthetic head with speech-driven facial animation [O] . I-chen Lin, Chien-feng Huang, Jia-chi Wu, 2015

机译：具有语音驱动的面部动画的低比特率网络合成头

Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach

摘要

著录项

相似文献

相关主题

期刊订阅