USING VISEME BASED ACOUSTIC MODELS FOR SPEECH DRIVEN LIP SYNTHESIS

机译：使用基于Viseme的声学模型进行语音驱动唇缘合成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech driven lip synthesis is an interesting and important step toward human-computer interaction. An incoming speech signal is time aligned using a speech recognizer to generate phonetic sequence which is then converted to corresponding viseme sequence to be animated. In this paper, we present a novel method for generation of the viseme sequence, which uses viseme based acoustic models, instead of usual phone based acoustic models, to align the input speech signal. This results in higher accuracy and speed of the alignment procedure and allows a much simpler implementation of the speech driven lip synthesis system as it completely obviates the requirement of acoustic unit to visual unit conversion. We show through various experiments that the proposed method results in about 53% relative improvement in classification accuracy and about 52% reduction in time, required to compute alignments.

机译：语音驱动的唇缘合成是朝着人机交互的有趣和重要的一步。输入的语音信号是使用语音识别器对齐的时间对齐，以生成语音序列，然后将其转换为要动画的相应的模糊序列。在本文中，我们提出了一种用于生成血管序列的新方法，它使用基于Viseme的声学模型，而不是通常的电话基声学模型，以对准输入语音信号。这导致对准过程的更高准确性和速度，并且允许语音驱动唇合成系统的更简单的实现，因为它完全消除了声学单元对视觉单元转换的要求。我们通过各种实验表明，所提出的方法导致分类精度的相对改善约为53％，而计算对准所需的时间约为52％。

著录项

来源
《International Conference on Multimedia and Expo》|2003年||共4页
会议地点
作者
Ashish Verma; Nitendra Rajput; L. V Subramaniam;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP37-53;
关键词

相似文献

外文文献
中文文献
专利

1. Unit Selection Speech Synthesis Using Frame-Sized Speech Segments and Neural Network Based Acoustic Models [J] . Zhen-Hua Ling, Zhi-Ping Zhou Journal of VLSI signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于帧大小的语音片段和基于神经网络的声学模型的单位选择语音合成
2. Viseme Recognition System Based on Transformed Acoustic Models [J] . A. Zgank, Z. Kacic Electronics and Electrical Engineering . 2013,第9期

机译：基于变换声学模型的Viseme识别系统
3. A comparative study on modeling and controlling emotional acoustic parameters in neural networks based Japanese and Spanish speech synthesis [J] . JAIME LORENZO-TRUEBA, SHINJI TAKAKI, JUNICHI YAMAGISHI 電子情報通信学会技術研究報告. 音声. Speech . 2016,第378期

机译：基于日语和西班牙语语音合成的神经网络中的情感声学参数建模和控制的比较研究
4. USING VISEME BASED ACOUSTIC MODELS FOR SPEECH DRIVEN LIP SYNTHESIS [C] . Ashish Verma, Nitendra Rajput, L. V Subramaniam International Conference on Multimedia and Expo . 2003

机译：使用基于Viseme的声学模型进行语音驱动唇缘合成
5. Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition. [D] . Liu, Yuzong. 2016

机译：用于自动语音识别的声学建模中基于图的半监督学习。
6. Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels [O] . Santiago-Omar Caballero-Morales 2013

机译：墨西哥西班牙语语音中的情绪识别：一种基于情绪特定元音声学模型的方法
7. A Nonlinear Viseme Model for Triphone-Based Speech Synthesis [O] . Robert Bargmann, Volker Blanz, Hans-peter Seidel 2015

机译：基于Triphone的语音合成的非线性Viseme模型

USING VISEME BASED ACOUSTIC MODELS FOR SPEECH DRIVEN LIP SYNTHESIS

摘要

著录项

相似文献

相关主题

期刊订阅