We present a new system-level framework for the automatic detection and tracking of multiple persons' heads in intelligent meeting rooms. We implement this approach with a distributed array of cameras that detect the meeting participants and continuously estimate their head orientation and head movements in 6 degrees-of-freedom with fine precision. The initial position of each person is obtained with a set of face detectors coupled with a new iterative approach to re-solve the 3D ambiguities from overlapping epipolar lines. The head pose is obtained from a hybrid head pose estimation and tracking scheme that combines support vector regressors with a new multi-view 3D model-based tracking system. The purpose of this system is to facilitate the automatic semantic analysis of group meetings. As an example application, we evaluate the ability of the system to estimate the person that receives the most visual attention in the form of head direction.
展开▼