Photo-real talking head with deep bidirectional LSTM

机译：具有深度双向LSTM的照片真实谈角

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Long short-term memory (LSTM) is a specific recurrent neural network (RNN) architecture that is designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we propose to use deep bidirectional LSTM (BLSTM) for audio/visual modeling in our photo-real talking head system. An audio/visual database of a subject's talking is firstly recorded as our training data. The audio/visual stereo data are converted into two parallel temporal sequences, i.e., contextual label sequences obtained by forced aligning audio against text, and visual feature sequences by applying active-appearance-model (AAM) on the lower face region among all the training image samples. The deep BLSTM is then trained to learn the regression model by minimizing the sum of square error (SSE) of predicting visual sequence from label sequence. After testing different network topologies, we interestingly found the best network is two BLSTM layers sitting on top of one feed-forward layer on our datasets. Compared with our previous HMM-based system, the newly proposed deep BLSTM-based one is better on both objective measurement and subjective A/B test.

机译：长短期记忆（LSTM）是被设计成时间序列和它们的远距离依赖性比更准确地常规RNNs建模特定回归神经网络（RNN）架构。在本文中，我们建议在我们的照片实际谈话头系统中使用深双向LSTM（BLSTM）进行音频/可视建模。首先将受试者谈话的音频/视觉数据库作为我们的培训数据记录。音频/视觉立体声数据被转换为两个并行时间序列，即，通过在所有训练中施加在下面区域上的主动外观模型（AAM）来通过强制对准文本而获得的上下文标签序列。图像样本。然后，通过最小化从标签序列预测视觉序列的平方误差（SSE）的总和来训练深蓝色的BLSTM以学习回归模型。在测试不同的网络拓扑后，我们有趣地发现最好的网络是两个BLSTM层坐在我们数据集上的一个前馈层的顶部。与我们以前的基于HMM的系统相比，新提出的基于BLSTM的基于深度的基于Blstm的一个更好的客观测量和主观A / B测试。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2015年||共5页
会议地点
作者
B. Fan; L. Wang; F. K. Soong; L. Xie;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
AAM; BLSTM; RNN; talking head;

机译：aam;blstm;rnn;说话;
入库时间 2022-08-20 20:07:19

相似文献

外文文献
中文文献
专利

1. A deep bidirectional LSTM approach for video-realistic talking head [J] . Fan Bo, Xie Lei, Yang Shan, Multimedia Tools and Applications . 2016,第9期

机译：用于视频现实谈话头的深度双向LSTM方法
2. Deep Concatenated Residual Network With Bidirectional LSTM for One-Hour-Ahead Wind Power Forecasting [J] . Ko Min-Seung, Lee Kwangsuk, Kim Jae-Kyeong, Sustainable Energy, IEEE Transactions on . 2021,第2期

机译：具有双向LSTM的深度连接剩余网络，用于一小时前进风力预测
3. DECAB-LSTM: Deep Contextualized Attentional Bidirectional LSTM for cancer hallmark classification [J] . Jiang Longquan, Sun Xuan, Mercaldo Francesco, Knowledge-Based Systems . 2020,第Deca27期

机译：Decab-LSTM：癌症符号分类的深层环境化预防双向LSTM
4. Photo-real talking head with deep bidirectional LSTM [C] . Fan Bo, Wang Lijuan, Soong Frank K., IEEE International Conference on Acoustics, Speech and Signal Processing . 2015

机译：具有深度双向LSTM的真实照片讲话头
5. Deep Learning-Based Hosting Capacity Analysis in LV Distribution Grids with Spatial-Temporal LSTMs [D] . Wu, Jiaqi. 2021

机译：LV分布网的基于深度学习的托管能力分析，具有空间时间LSTMS
6. LSTMCNNsucc: A Bidirectional LSTM and CNN-Based Deep Learning Method for Predicting Lysine Succinylation Sites [O] . Guohua Huang, Qingfeng Shen, Guiyang Zhang, 2021

机译：LSTMCNNSUCC：一种预测赖氨酸琥珀酸位点的双向LSTM和基于CNN的深度学习方法
7. Rendering A Personalized Photo-Real Talking Head from Short Video Footage [O] . Lijuan Wang, Wei Han, Xiaojun Qian, 2013

机译：从短视频画面渲染个性化的照片真实说话头

Photo-real talking head with deep bidirectional LSTM

摘要

著录项

相似文献

相关主题

期刊订阅