Describes a transmodal mapping from audio speech to talking faces based on hidden Markov models (HMMs). If face movements are synthesized well enough for natural communication, a lot of benefits will be brought to human-machine communication. This paper describes an HMM-based speech-driven lip movement synthesis. The paper also describes its improvement by audio-visual joint estimation and its extension to talking face generation. The results of evaluation experiments show that the proposed method generates natural and accurate talking faces from audio speech inputs.
展开▼