Accurate visible speech synthesis based on concatenating variable length motion capture data

Ma J.; Cole R.; Pellom B.; Ward W.; Wise B.

首页> 外文期刊>IEEE transactions on visualization and computer graphics >Accurate visible speech synthesis based on concatenating variable length motion capture data

【24h】

Accurate visible speech synthesis based on concatenating variable length motion capture data

机译：基于级联可变长度运动捕获数据的准确可见语音合成

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is described. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergartens through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.

机译：我们提出了一种新颖的方法，该方法可以基于大量运动捕获数据中的最佳可变长度单位进行搜索和级联来合成准确的可见语音。基于在源面部上选择的一组视觉原型以及为目标面部指定的一组对应的原型，我们提出了一种机器学习技术，可将在源面部上观察到的面部运动自动映射到目标面部。为了模拟可见语音中的长距离发音效果，收集，注释和分析了涵盖英语中最常见音节的大型语料库。对于任何输入文本，都描述了一种搜索算法，用于定位用于合成的串联单元的最佳序列。还提出了一种将嘴唇运动从通用3D面部模型适配到特定3D面部模型的新算法。基于该方法，实现了一个完整的，端到端的可见语音动画系统。目前，该系统已在60多家幼儿园到三年级教室中使用，教学生使用逼真的对话动画代理进行阅读。为了评估动画系统产生的可见语音的质量，进行了主观评估和客观评估。评估结果表明，该方法对可见语音合成准确，有效。

著录项

来源
《IEEE transactions on visualization and computer graphics》 |2006年第2期|p.266-276|共11页
作者
Ma J.; Cole R.; Pellom B.; Ward W.; Wise B.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类模式识别与装置;
关键词
computer animation; face recognition; image motion analysis; learning (artificial intelligence); search problems; solid modelling; speech synthesis; 3D face model; coarticulation effect; facial motion; lip motion; machine learning technique; motion capture data; opti;

机译：计算机动画;人脸识别;图像运动分析;学习（人工智能）;搜索问题;实体建模;语音合成;3D人脸模型;关节效果;面部运动;嘴唇运动;机器学习技术;运动捕捉数据;优化;
入库时间 2022-08-17 13:42:23

相似文献

外文文献
中文文献
专利

1. Accurate automatic visible speech synthesis of arbitrary 3D models based on concatenation of diviseme motion capture data [J] . Jiyong Ma, Ronald Cole, Bryan Pellom, Computer Animation and Virtual Worlds . 2004,第5期

机译：基于diviseme运动捕获数据的级联，任意3D模型的准确自动可见语音合成
2. A Concatenative Speech Synthesis Method Using Context Dependent Phoneme Sequences with Variable Length as Search Units [J] . Hiroyuki SEGI, Tohru TAKAGI 電子情報通信学会技術研究報告. 音声. Speech . 2003,第264期

机译：一种基于上下文的变长音素序列作为搜索单元的语音合成方法
3. A Concatenative Speech Synthesis Method Using Context Dependent Phoneme Sequences with Variable Length as Search Units [J] . Hiroyuki SEGI, Tohru TAKAGI 電子情報通信学会技術研究報告. 音声. Speech . 2003,第264期

机译：使用具有可变长度的上下文依赖性音素序列作为搜索单元的连接性语音合成方法
4. SPEECH UNIT SELECTION BASED ON TARGET VALUES DRIVEN BY SPEECH DATA IN CONCATENATIVE SPEECH SYNTHESIS [C] . Toshio Hirai, Seiichi Tenpaku, Kiyohiro Shikano IEEE Workshop on Speech Synthesis . 2003

机译：语音单元选择基于由语音数据驱动的目标值在连接语音合成中
5. The virtual hip: An anatomically accurate finite element model based on the visible human dataset. [D] . Ford, Jonathan M. 2010

机译：虚拟髋部：基于可见的人类数据集的解剖学上精确的有限元模型。
6. Computer management of clinical information: capture and retrieval of clinical orthopedic data by means of the variable-field-length format. [O] . R. Lusskin, J. Korein, W. A. Thompson, 1972

机译：临床信息的计算机管理：通过可变字段长度格式来捕获和检索临床骨科数据。
7. Accurate visible speech synthesis based on concatenating variable length motion capture data [O] . Jiyong Ma, Bryan Pellom, Wayne Ward, 2006

机译：基于级联可变长度运动捕获数据的准确可见语音合成

Accurate visible speech synthesis based on concatenating variable length motion capture data

摘要

著录项

相似文献

相关主题

期刊订阅