This paper presents the results of experimentation with a simple ultrasonic lip motion detector or "Ultrasonic Mike" in automatic speech recognition. The device is tested in a speaker dependent isolated word recognition task with a vocabulary consisting of the spoken digits from zero to nine. The "Ultrasonic Mike" is used as input to an automatic lip reader. The automatic lip reader uses template matching and dynamic time warping to determine the best candidate for a given test utterance. The device is first tested as a stand alone automatic lip reader achieving accuracy as high as 89%. Next the automatic lip reader is combined with a conventional automatic speech recognizer. Classifier fusion is based on a pseudo probability mass function derived from the dynamic time warping distances. The combined system is tested with various levels of acoustic noise added. In a typical example, at 0 dB, the acoustic recognizer's accuracy was 78%, the lip reader accuracy was at 69%, but the combined accuracy was 93%. This experiment demonstrates that this simple ultrasonic lip motion detector, that has an output data rate 12500 times less than a typical video camera, can improve automatic speech recognition in noisy environments. This experiment also demonstrates an effective classifier fusion algorithm based on dynamic time warping distances.
展开▼