Abstract: This paper presents a real-time video surveillance system which is capable of tracking multiple persons and locating faces in moderately complex scenes. Rather than using heavily parameterized models for the tracking of foreground regions, we suggest the modeling of objects based on the bounding boxes that contain them. The algorithm describes a novel integration of dynamic reference frame differencing and coarse motion estimation to overcome the various occlusion problems encountered in multiple object tracking. Change detection is performed by taking the difference between the current frame and a dynamic reference frame, where the reference frame is adaptively updated over time to account for changes in the background, illumination variations, and the like. Video object segmentation establishes a mapping from this binary change detection map to an indexed segmentation map by utilizing coarse directional information in addition to the size and position of connected foreground regions. We employ adaptive linear predictive filtering of the bounding box model in conjunction with the motion displacement estimates to accurately track multiple occluding objects. Once the video is segmented into foreground and background areas, we search within a subset of the foreground bounding boxes using chrominance histogram matching to detect facial regions. !30
展开▼