In this thesis, we present a clustering-based framework for efficient, unsupervised video content analysis and management. The proposed algorithms implement the components of a complete visual content management system, and provide such key functionalities as navigation/browsing, visual summarization/compaction, color-based retrieval, and (pseudo-)semantic annotation. The group-of-frames video representation scheme and color descriptors that we introduce have become normative elements of the new multimedia standard MPEG-7, and the proposed analysis tools can be used to generate components of MPEG-7-compliant content descriptions.; We first introduce a crisp clustering algorithm to identify, in near real-time and without the need for threshold selection, the individual shots within a video sequence. Histogram-based descriptors are then defined for joint representation of the color features of a collection of frames. The family of alpha-trimmed average histograms provide a robust set of color descriptors that can eliminate the effects of aberrant frames within a shot or video segment. The intersection histogram, on the other hand, reflects the number of pixels of a given color that is common to all frames of interest.; These color descriptors are subsequently utilized as part of a fuzzy clustering algorithm to determine and extract the optimum, non-redundant set of representative frames for each segment in a sequence. The approach utilizes associated components of the MPEG-7 standard, and generates hierarchical video frame extraction process is largely unsupervised, and the resulting visual summary can be dynamically updated according to user preferences during a browsing session. In addition to these generic, domain-independent methods, a fuzzy classification technique is developed for unsupervised, domain-dependent video sequence analysis and shot labeling. While the algorithm requires pre-computed reference models that characterize each domain, these models are established (semi-)automatically during a training stage, again through the use of fuzzy clustering and cluster validation methods. A fuzzy nearest-neighbor/prototype classifier is then used to assign unlabeled video shots to the available classes. The fuzzy membership values associated with each shot allow erroneous classifications to be avoided when an input sample lacks strong affiliation with any of the categories.
展开▼