In this paper, we propose an efficient part-based approach for action recognition. The main concept is to recognize human actions by less occluded parts without using a large set of part filters. Therefore, our approach is robust to occlusion and cost-effective. We extract spatiotemporal features from RGB-D videos, and assign a part-label to each feature. Then, for each part, a recognition score is computed for each action class by pyramid-structural bag of words (BoW-Pyramid) representation. The final result is determined by weighted sum of these scores and contextual information, which is based on the ratio of features between every pair of parts. Several contributions have been made in this work. First, the proposed part-based method is robust to occlusion and operates on-line. Second, our BoW-Pyramid representation can distinguish actions with reversed temporal orders. Third, recognition accuracy is increased by incorporating contextual information. The provided experimental results have verified effectiveness of our method and demonstrated high promise of surpassing performance of the state-of-the-art works.
展开▼