Acoustically, car cabins are extremely noisy and asuda consequence audio-only, in-car voice recognition systemsudperform poorly. As the visual modality is immune toudacoustic noise, using the visual lip information from theuddriver is seen as a viable strategy in circumventing thisudproblem by using audio visual automatic speech recognitionud(AVASR). However, implementing AVASR requires a system being able to accurately locate and track the driversudface and lip area in real-time. In this paper we presentudsuch an approach using the Viola-Jones algorithm. Usingudthe AVICAR [1] in-car database, we show that the Viola-udJones approach is a suitable method of locating and trackingudthe driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.
展开▼