This disclosure relates generally to hand-gesture recognition, and more particularly to system and method for detecting interaction of 3D dynamic hand gestures with frugal AR devices. In one embodiment, a method for hand-gesture recognition includes receiving frames of a media stream of a scene captured from a FPV of a user using RGB sensor communicably coupled to a wearable AR device. The media stream includes RGB image data associated with the frames of the scene. The scene comprises a dynamic hand gesture performed by the user. Temporal information associated with the dynamic hand gesture is estimated from the RGB image data by using a deep learning model. The estimated temporal information is associated with hand poses of the user and comprises key-points identified on user's hand in the frames. Based on said temporal information, the dynamic hand gesture is classified into predefined gesture classes by using multi-layered LSTM classification network.
展开▼