This contribution is about a creation and selection of the visual front end speech features. The use of the visual shape and the appearance-based visual features are described here. These visual features can be used for the visual or for the audiovisual speech recognition. Before they are used, the features have to be normalized and selected in such a way, so that the recognition rate was high enough. The second task has been the use of the fusion of different kinds of visual and acoustic speech features. The experiments for the audio-visual recognition of isolated words have been created in the conclusion of this work.
展开▼