The great scientific advance due to Convolutional Neural Networks (ConvNets) for image recognition problems encouraged many researchers to apply ConvNets on video understanding tasks such as human action recognition. However, related state-of-the-art approaches differ in various aspects, making it difficult to compare their results. This work compares some of those approaches in a shared environment using a standard protocol. The results give indication about the effectiveness of fundamental ideas behind proposed approaches and other influencing factors. Based on these findings several adaptations of the parameters and methods have been implemented and tested. Human action recognition problems are commonly approached by facing two complementary aspects of vision. The first one relies on appearance of shown objects, scenes and human poses. It can be considered a regular image recognition task. The second utilizes optical flow estimations to exploit motion information. Since image recognition already is a well-researched area, it is the temporal aspect which is in need of further investigation. The studies were therefore focused on optical flow, which allowed to deeper investigate the less researched sub-discipline.
展开▼