Synthesizing Talking Faces from Text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Synthesizing Talking Faces from Text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network

【24h】

Synthesizing Talking Faces from Text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network

机译：从文本和音频合成谈话面部：AutoEncoder和序列到序列卷积神经网络

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Synthesizing talking face from text and audio is increasingly becoming a direction in human-machine and face-to-face interactions. Although progress has been made, several existing methods either have unsatisfactory co-articulation modeling effects or ignore relations between adjacent inputs. Moreover, some of these methods often train models on shaky head videos or utilize linear-based face parameterization strategies, which further decrease synthesized quality. To address the above issues, this study proposes a sequence-to-sequence convolutional neural network to automatically synthesize talking face video with accurate lip sync. First, an advanced landmark location pipeline is used to accurately locate the facial landmarks, which can effectively reduce landmark shake. Then, a part-based autoencoder is presented to encode face images into a low-dimensional space and obtain compact representations. A sequence-to-sequence network is also presented to encode the relation of neighboring frames with multiple loss functions, and talking faces are synthesized through a reconstruction strategy with a decoder. Experiments on two public audio-visual datasets and a new dataset called CCTV news demonstrate the effectiveness of the proposed method against other state-of-the-art methods. (C) 2020 Elsevier Ltd. All rights reserved.

机译：从文本和音频合成谈话脸越来越多地成为人机和面对面交互的方向。虽然已经取得了进展，但有几种现有方法具有不令人满意的共同关注建模效果或忽略相邻输入之间的关系。此外，这些方法中的一些经常在摇摇欲坠的头视频上培训模型或利用基于线性的面部参数化策略，这进一步降低了合成的质量。为了解决上述问题，本研究提出了一种序列到序列的卷积神经网络，以自动合成具有精确的唇部同步的谈话脸视频。首先，使用先进的地标位置管道来准确地定位面部地标，这可以有效地减少地标抖动。然后，呈现基于零件的AutoEncoder以将面部图像编码为低维空间并获得紧凑的表示。还呈现序列到序列网络以对具有多个损耗功能的相邻帧的关系进行编码，并且通过具有解码器的重建策略来合成谈话面。两个公共视听数据集的实验和名为CCTV新闻的新数据集展示了所提出的方法对其他最先进方法的有效性。（c）2020 elestvier有限公司保留所有权利。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2020年第2020期|共14页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Convolutional neural network; Autoencoder; Regression; Face landmark; Face tracking; Lip sync; Video; Audio;

机译：卷积神经网络;autoencoder;回归;面部地标;面部跟踪;唇部同步;视频;音频;

相似文献

外文文献
中文文献
专利

1. Synthesizing Talking Faces from Text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network [J] . Pattern Recognition: The Journal of the Pattern Recognition Society . 2020,第期

机译：从文本和音频合成谈话面部：AutoEncoder和序列到序列卷积神经网络
2. Recurrent neural network-based semantic variational autoencoder for Sequence-to-sequence learning [J] . Jang Myeongjun, Seo Seungwan, Kang Pilsung Information Sciences: An International Journal . 2019,第期

机译：基于经常性的神经网络的语义变分自身级别，用于序列到序列学习
3. Detection of Bleeding Events in Electronic Health Record Notes Using Convolutional Neural Network Models Enhanced With Recurrent Neural Network Autoencoders: Deep Learning Approach [J] . Rumeng Li, Baotian Hu, Feifan Liu, JMIR Medical Informatics . 2019,第1期

机译：使用循环神经网络自动编码器增强的卷积神经网络模型检测电子病历中的出血事件：深度学习方法
4. Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction [C] . Maha Elbayad, Laurent Besacier, Jakob Verbeek 22nd conference on computational natural language learning . 2018

机译：普遍关注：2D卷积神经网络用于序列到序列的预测
5. Deep Neural Language Model for Text Classification Based on Convolutional and Recurrent Neural Networks [D] . Hassan, Abdalraouf. 2018

机译：基于卷积神经网络和递归神经网络的深度神经语言文本分类模型
6. Fault Diagnosis of Rotating Machinery under Noisy Environment Conditions Based on a 1-D Convolutional Autoencoder and 1-D Convolutional Neural Network [O] . Xingchen Liu, Qicai Zhou, Jiong Zhao, 2019

机译：基于一维卷积自动编码器和一维卷积神经网络的嘈杂环境下旋转机械故障诊断
7. Recurrent neural network-based semantic variational autoencoder for Sequence-to-sequence learning [O] . Myeongjun Jang, Seungwan Seo, Pilsung Kang 2019

机译：基于循环到序列学习的经常性神经网络的语义变形AutoEncoder

Synthesizing Talking Faces from Text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network

摘要

著录项

相似文献

相关主题

期刊订阅