An efficient algorithm for humans' retrieval from large video databases is presented in this paper. Such an extraction is very useful for a variety of applications, including video surveillance for security purposes and systems of speaker identification. A human face and body detector is first proposed, based on a simple probabilistic model, to approximately estimate human face and body regions. The adopted approach significantly reduces the required computational cost and simultaneously exploits information existing in MPEG-coded video data. A segmentation fusion scheme is then applied to improve segmentation accuracy. Based on the created segmentation map, a graph is then constructed, which represents the spatial relationship of the extracted segments. Color, texture, motion and shape characteristics are included as additional features to the nodes of the graph. To enhance the flexibility of the proposed system, each node is further decomposed into other graphs (sub-graphs) resulting in a pyramidal graph representation of the visual content.
展开▼