This paper addresses the problem of extracting video objects from MPEG compressed video. The only cues used for object segmentation are the motion vectors which are sparse in MPEG. A method for automatically estimating the number of objects and extracting independently moving video objects using motion vectors is presented. First, the motion vectors are accumulated over few frames to enhance the motion information, which are further spatially interpolated to get a dense motion vectors. The final segmentation from the dense motion vectors is obtained by applying the expectation maximization (EM) algorithm. A block based affine clustering method is proposed for determining the number of appropriate motion models to be used for the EM step. Finally, the segmented objects are temporally tracked to obtain the video objects. This work has been carried out in the context of the emerging MPEG-4 standard which aims at interactivity at the object level.
展开▼