In this paper, we propose an accurate and effective method for detecting abnormal behavior. We consider the video as a series of frame sequences; in the training phase, our deep learning framework is used to extract appearance features and learn the relationship between historical features and current features in the normal video. In the testing phase, the predicted features that differ from the actual features are considered as abnormal. Our model is designed as a feature prediction framework with a new temporal attention mechanism. In the feature extraction stage, we transform a pre-trained Vgg16 network into a fully convolutional neural network and used the third pooling layer output as the appearance feature extraction to effectively capture static appearance features. Then, a new temporal attention mechanism is introduced to learn the contribution of different historical appearance features at the same position to the current features, thereby solving the problem of representing dynamic motion features. Finally, the LSTM network is used to decode the historical feature sequences with temporal attention to predict the features at the current moment. Those actual features that differ from the predicted features are considered as abnormal features. Using upsampling for the abnormal features locates abnormal behavior on the original frames. Experiments on two benchmark datasets demonstrate the competitive performance of our method with the state-of-the-art methods.
展开▼