In the emerging eld of speech-to-speech translation, empha- sis is currently placed on the linguistic content, while the sig- ni cance of paralinguistic information conveyed by facial ex- pression or tone of voice is typically neglected. We present a prototype system for multimodal speech-to-speech transla- tion that is able to automatically recognize and translate spo- ken utterances from one language into another, with the out- put rendered by a speech synthesis system. The novelty of our system lies in the technique of generating the synthetic speech output in one of several expressive styles that is au- tomatically determined using a camera to analyze the user’s facial expression during speech.
展开▼