Videotext can be an efficient semantic index and summary for instructional videos. However, videotext usually appears in different visual formats: handwritten slides, electronic slides, book pages, web pages, handwriting on chalkboard, etc. We propose a unified approach to handle all these kinds of videotext in three steps. First, we detect still video segments by analyzing motion energy patterns in instructional videos, and construct a quality-enhanced candidate text frame for each still video segment. Then, we use a trained SVM classifier to verify the candidate text frames, as well as to segment the text region and individual text blocks from the verified frames. Finally, we filter redundant text frames with similar text content by a Hausdorff distance-based image comparison algorithm. The resulting text frames are automatically organized into HTML and PDF documents to serve as an imagery summarization of the instructional videos. We show the application of our method to 75 instructional videos of five different courses, and discuss its applications.
展开▼