PROBLEM TO BE SOLVED: To automatically synthesize video images for expressing the movement of lips and a sign language with high quality together with voice information and to multi-modally present them. SOLUTION: The video images for which the movement of the lips and the movement of the sign language are photographed beforehand are recorded in a moving picture element piece data base part 4 for each single syllable or word or sentence. Pertinent moving picture element pieces are detected from the moving picture element piece data base part 4 by a moving picture element piece detection part 5 corresponding to a text supplied from a text sentence preparation part 3 and the motion vectors of the video images at the joined parts of the respective moving picture element pieces are detected by a motion vector detection system 6 or the like based on the detected moving picture element pieces. The number of fields interpolated at the joined parts of the respective moving picture element pieces by a field interpolation part and an interpolation position are varied based on the motion vector of the video images and thus, the moving picture element pieces are smoothly joined and synthesized while keeping the flow of the video images natural. Also, synthesized moving images and voice output are synchronized as well.
展开▼