首页> 外文会议>Annual Conference of the International Speech Communication Association >Visual speech synthesis using dynamic visemes, contextual features and DNNs
【24h】

Visual speech synthesis using dynamic visemes, contextual features and DNNs

机译:使用动态探测,上下文特征和DNNS的视觉语音合成

获取原文

摘要

This paper examines methods to improve visual speech synthesis from a text input using a deep neural network (DNN). Two representations of the input text are considered, namely into phoneme sequences or dynamic viseme sequences. From these sequences, contextual features are extracted that include information at varying linguistic levels, from frame level down to the utterance level. These are extracted from a broad sliding window that captures context and produces features that are input into the DNN to estimate visual features. Experiments first compare the accuracy of these visual features against an HMM baseline method which establishes that both the phoneme and dynamic viseme systems perform better with best performance obtained by a combined phoneme-dynamic viseme system. An investigation into the features then reveals the importance of the frame level information which is able to avoid discontinuities in the visual feature sequence and produces a smooth and realistic output.
机译:本文检查了使用深神经网络(DNN)从文本输入改善视觉语音合成的方法。考虑输入文本的两个表示,即进入音素序列或动态视觉序列。根据这些序列,提取上下文特征,其包括在不同语言水平上的信息,从帧级向下到话语级别。这些从广播窗口中提取,该窗口捕获上下文并产生输入到DNN以估计视觉特征的特征。实验首先比较这些视觉特征的准确性,该方法对HMM基线方法,该方法建立了音素和动态视觉模型系统的更好,具有通过组合音素动态性发生系统获得的最佳性能。该特征的调查揭示了帧级信息的重要性,该信息能够避免视觉特征序列中的不连续性并产生平滑和现实的输出。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号