A key problem in building an interface in which the user uses hand gestures to control a computer generated display without restrictions is the ability to localize and track the human arm in image sequences. The paper proposes a multimodal localization scheme combined with a tracking framework that exploits the articulated structure of the arm. The localization uses the multiple cues of motion, shape and color to locate a set of image features. Using constraint fusion, these features are tracked by a modified extended Kalman filter. An interaction scheme between tracking and localization is proposed in order to improve the estimation while decreasing the computational requirement. The results of extensive simulations and experiments with real data are described including a large database of hand gestures involved in display control.
展开▼