The occlusion in dynamic or clutter scene is a critical issue in multi-object tracking. Using latent variable to formulate this problem, some methods achieved state-of-the-art performance, while making an exact solution computationally intractable. In this paper, we present a hierarchical association framework to address the problem of occlusion in a complex scene taken by a single camera. At the first stage, reliable tracklets are obtained by frame-to-frame association of detection responses in a flow network. After that, we propose to formulate track-lets association problem in a spatio-temporal clustering model which presents the problem as faithfully as possible. Due to the important role that affinity model plays in our formulation, we then construct a sparsity induced affinity model under the assumption that a detection sample in a tracklet can be efficiently represented by another tracklet belonging to the same object. Furthermore, we give a near-optimal algorithm based on globally greedy strategy to deal with spatio-temporal clustering, which runs linearly with the number of tracklets. We quantitatively evaluate the performance of our method on three challenging data sets and achieve a significant improvement compared to state-of-the-art tracking systems.
展开▼