Visual speech synthesis using dynamic visemes, contextual features and DNNs

机译：使用动态探测，上下文特征和DNNS的视觉语音合成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper examines methods to improve visual speech synthesis from a text input using a deep neural network (DNN). Two representations of the input text are considered, namely into phoneme sequences or dynamic viseme sequences. From these sequences, contextual features are extracted that include information at varying linguistic levels, from frame level down to the utterance level. These are extracted from a broad sliding window that captures context and produces features that are input into the DNN to estimate visual features. Experiments first compare the accuracy of these visual features against an HMM baseline method which establishes that both the phoneme and dynamic viseme systems perform better with best performance obtained by a combined phoneme-dynamic viseme system. An investigation into the features then reveals the importance of the frame level information which is able to avoid discontinuities in the visual feature sequence and produces a smooth and realistic output.

机译：本文检查了使用深神经网络（DNN）从文本输入改善视觉语音合成的方法。考虑输入文本的两个表示，即进入音素序列或动态视觉序列。根据这些序列，提取上下文特征，其包括在不同语言水平上的信息，从帧级向下到话语级别。这些从广播窗口中提取，该窗口捕获上下文并产生输入到DNN以估计视觉特征的特征。实验首先比较这些视觉特征的准确性，该方法对HMM基线方法，该方法建立了音素和动态视觉模型系统的更好，具有通过组合音素动态性发生系统获得的最佳性能。该特征的调查揭示了帧级信息的重要性，该信息能够避免视觉特征序列中的不连续性并产生平滑和现实的输出。

著录项

来源
《Annual Conference of the International Speech Communication Association》|2016年|p2318-3105|共5页
会议地点
作者
Ausdang Thangthai; Ben Milner; Sarah Taylor;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TB95-53;
关键词

相似文献

外文文献
中文文献
专利

1. Synthesising visual speech using dynamic visemes and deep learning architectures [J] . Thangthai Ausdang, Milner Ben, Taylor Sarah Computer speech and language . 2019,第MAY期

机译：使用动态视位和深度学习架构合成视觉语音
2. A Realistic Visual Speech Synthesis for Indonesian Using A Combination of Morphing Viseme and Syllable Concatenation Approach to Support Pronunciation Learning [J] . _ Aripin, Hanny Haryanto International Journal of Emerging Technologies in Learning (iJET) . 2018,第8期

机译：结合变位Viseme和音节串联方法的印度尼西亚语逼真的视觉语音合成，以支持语音学习
3. Visual Speech Synthesis by Morphing Visemes [J] . Tony Ezzat, Tomaso Pgooio NTT R&D . 2001,第7期

机译：变形视位合成视觉语音
4. Visual speech synthesis using dynamic visemes, contextual features and DNNs [C] . Ausdang Thangthai, Ben Milner, Sarah Taylor Annual Conference of the International Speech Communication Association . 2016

机译：使用动态探测，上下文特征和DNNS的视觉语音合成
5. Classification of visemes using visual cues. [D] . Alothmany, Nazeeh Shuja. 2009

机译：使用视觉提示对视位素进行分类。
6. Altered dynamics of visual contextual interactions in Parkinson’s disease [O] . M. Isabel Vanegas, Annabelle Blangero, James E. Galvin, 2019

机译：帕金森氏病中视觉上下文交互作用的变化
7. Visual speech synthesis using dynamic visemes, contextual features and DNNs [O] . Thangthai Ausdang, Milner Ben, Taylor Sarah 2016

机译：使用动态视位，上下文特征和DNN进行视觉语音合成

Visual speech synthesis using dynamic visemes, contextual features and DNNs

摘要

著录项

相似文献

相关主题

期刊订阅