In this paper, we study one-shot learning gesturerecognition on RGB-D data recorded from Microsoft’s Kinect.To this end, we propose a novel bag of manifold words (BoMW)based feature representation on sysmetric positive definite (SPD)manifolds. In particular, we use covariance matrices to extractlocal features from RGB-D data due to its compact representationability as well as the convenience of fusing both RGB and depthinformation. Since covariance matrices are SPD matrices andthe space spanned by them is the SPD manifold, traditionallearning methods in the Euclidean space such as sparse codingcan not be directly applied to them. To overcome this problem,we propose a unified framework to transfer the sparse coding onSPD manifolds to the one on the Euclidean space, which enablesany existing learning method can be used. After building BoMWrepresentation on a video from each gesture class, a nearestneighbour classifier is adopted to perform the one-shot learninggesture recognition. Experimental results on the ChaLearngesture dataset demonstrate the outstanding performance of theproposed one-shot learning gesture recognition method comparedagainst state-of-the-art methods. The effectiveness of the proposedfeature extraction method is also validated on a new RGBDaction recognition dataset.
展开▼