首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Expressive visual text to speech and expression adaptation using deep neural networks
【24h】

Expressive visual text to speech and expression adaptation using deep neural networks

机译:使用深神经网络的言语和表达式适应的表现力的视觉文本

获取原文

摘要

In this paper, we present an expressive visual text to speech system (VTTS) based on a deep neural network (DNN). Given an input text sentence and a set of expression tags, the VTTS is able to produce not only the audio speech, but also the accompanying facial movements. The expressions can either be one of the expressions in the training corpus or a blend of expressions from the training corpus. Furthermore, we present a method of adapting a previously trained DNN to include a new expression using a small amount of training data. Experiments show that the proposed DNN-based VTTS is preferred by 57.9% over the baseline hidden Markov model based VTTS which uses cluster adaptive training.
机译:在本文中,我们基于深神经网络(DNN)向语音系统(VTTS)提出了一种表现力的视觉文本。给定输入文本句子和一组表达式标签,VTTS不仅能够产生音频语音,而且能够产生伴随的面部运动。该表达可以是培训语料库中的表达式之一,也可以是来自培训语料库的表达式的表达式之一。此外,我们提出了一种方法,一种方法可以使用少量训练数据来调整先前培训的DNN以包括新表达式。实验表明,基于DNN的VTT的基于基于基线隐马尔可夫模型的VTT是优选的基于DNN的VTT,其使用集群自适应培训。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号