Systems, methods, and computer-readable media for video anomaly detection using a learning model. The method may include receiving a video including a plurality of image frames captured by a camera, the plurality of image frames including a current image frame corresponding to a selected time point and a set of prior image frames corresponding to time points prior to the selected time point. The method may also include extracting spatial features from each prior image frame at different resolutions and predicting an estimated image frame in each resolution based on the spatial features of the set of prior image frames in that resolution. The method may further include predicting an estimated current image frame based on the estimated image frames in the different resolutions and detecting the video anomaly based on a difference between the captured current image frame and the estimated current image frame.
展开▼