Lipreading has become a hot research topic in recent years since the visual in-formation extracted from the lip movement has been shown to improve the performance of automatic speech recognition (ASR) system especially under noisy environments [l]-[3], [5]. There are two important issues related to lipreading: 1) how to extract the most efficient features from lip image sequences, 2) how to build lipreading models. This paper mainly focuses on how to choose more efficient features for lipreading.
展开▼