We Present Mike Talk, a text-to-audiovisual speech synthesizer which converts input text into an au- diovisual speech stream. Mike Talk is built using visemes, which are a set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, cor- respondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated.
展开▼
机译:我们介绍了Mike Talk,这是一种文本到视听语音合成器,可将输入文本转换为视听语音流。 Mike Talk是使用视位音素构建的,这是一组涵盖了大范围口型的图像。视位素是从被记录的人类对象的视觉语料库中获取的,该语料库被专门设计为引发每个视位素的一个实例。使用光流方法,可以自动计算每个视位素与其他视位素之间的对应关系。通过沿着该对应关系变形,可以生成视位图像之间的平滑过渡。
展开▼