Automatic lip-reading research using the video sequence of the speaker's mouth has been carried out with significant interests in increasing the robustness of automatic speech recognition in noisy environments. However, it has not accomplished enough recognition rate yet. In this paper, we investigate the relationship of analysis frame interval and image resolution to check how they take effects on the lip-reading performance. Based on the experimental results under various analysis frame interval using the video sequence recorded by high speed camera, we make clean that it is effective to use the faster frame rate for high recognition performance. Another experimental results under various image resolution shows that the recognition performance does not depend on the image resolution. These results suggest that the visual feature vector extracted by our image based approach can reduce the resolution to 20×15 pix cells.
展开▼