We propose an coherent approach to extract key-frames within a video shot for object-based video segmentation. A unified feature space is first constructed to represent video frames and visual objects simultaneously in a joint spatio-temporal domain, and key-frame extraction is formulated as a feature selection process that aims to maximize the cluster divergence of video objects by selecting an optimal set of key-frames. Specifically, two different criteria are used to achieve joint key-frame extraction and object segmentation. One criterion recommends key-frame extraction that leads to the maximum pairwise interclass divergence between objects in the feature space. The other aims at maximizing the marginal divergence of objects in each frame. Simulations with both synthetic and real video data manifest the efficiency and robustness of the proposed methods.
展开▼