首页> 外文会议>International Conference on Multimedia and Expo >USING VISEME BASED ACOUSTIC MODELS FOR SPEECH DRIVEN LIP SYNTHESIS
【24h】

USING VISEME BASED ACOUSTIC MODELS FOR SPEECH DRIVEN LIP SYNTHESIS

机译:使用基于Viseme的声学模型进行语音驱动唇缘合成

获取原文

摘要

Speech driven lip synthesis is an interesting and important step toward human-computer interaction. An incoming speech signal is time aligned using a speech recognizer to generate phonetic sequence which is then converted to corresponding viseme sequence to be animated. In this paper, we present a novel method for generation of the viseme sequence, which uses viseme based acoustic models, instead of usual phone based acoustic models, to align the input speech signal. This results in higher accuracy and speed of the alignment procedure and allows a much simpler implementation of the speech driven lip synthesis system as it completely obviates the requirement of acoustic unit to visual unit conversion. We show through various experiments that the proposed method results in about 53% relative improvement in classification accuracy and about 52% reduction in time, required to compute alignments.
机译:语音驱动的唇缘合成是朝着人机交互的有趣和重要的一步。输入的语音信号是使用语音识别器对齐的时间对齐,以生成语音序列,然后将其转换为要动画的相应的模糊序列。在本文中,我们提出了一种用于生成血管序列的新方法,它使用基于Viseme的声学模型,而不是通常的电话基声学模型,以对准输入语音信号。这导致对准过程的更高准确性和速度,并且允许语音驱动唇合成系统的更简单的实现,因为它完全消除了声学单元对视觉单元转换的要求。我们通过各种实验表明,所提出的方法导致分类精度的相对改善约为53%,而计算对准所需的时间约为52%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号