In this paper, we propose a framework for clustering shots from stereoscopic videos into clusters that correspond to semantic concepts exploiting visual and disparity information. Various color, disparity and texture descriptors are applied to shot key frames for obtaining low-level representations. Self Organizing Maps are subsequently employed upon various combinations of these representations in order to determine a lattice of representative semantic concepts. Experimental results on performances and football stereoscopic videos show that the use of disparity information leads to better clustering compared to using visual information only.
展开▼