The low level acoustico-visual association reported by Yehia et al. (Speech Comm., 26(1):23-43, 1998) is exploited for audio-visual speech enhancement with natural video sequences. The aim of this study is to demonstrate that the redundant components of AV speech are extractible with a suitable representation which does not involve any categorization process. A comparative study is achieved between different types of audio features, including the initial Line Spectral Pairs (LSP) and 4-subbands envelope energy. A gain measure of the enhancement is applied for the comparison. The results clearly show that the coarse envelope features allows a better gain than the LSP.
展开▼