We propose a multi-layer RNN for sign language detection. The system uses features extracted automatically from a 2-stream convolutional neural network (CNN) that takes video image data and motion data as input. We also created a dataset of videos containing signing "in the wild" to be used for training and evaluation purposes. We compare our system against the state-of-the-art, and attain an improvement of around 18%, indicating that our network is able to leverage dynamic information of hand motion during detection.
展开▼