The present invention relates to a framework for detecting violence using spatial and temporal characteristics analysis of deep learning-based shadow images, and to improve the ability and accuracy of detecting violence in images. To this end, the present invention is a violent detection framework that detects a violent characteristic of an image by detecting a feature point of violence in an input image composed of video frames provided from a video camera or a video file. Step 1, extracting the 2D-based Y-frame black and white image by excluding the red (R), green (G), and blue (B) from each separated frame image, and the extracted 2D-based Y frame monochrome image A third step of sequentially accumulating a large number of 3D environments and converting them into Y-frame black and white images in a 3D environment, and extracting and accumulating frames of equal layers among the Y-frame monochrome images in the converted 3D environment to perform image convolution. Including the 4th step of deriving the desired detection scene using 3*3*3 filters, network-weighted and time-space-optimized video is created and applied to the algorithm to apply the feature points of violence to specific layers in the video convolution process. By continuously remembering and re-learning, it improves the violent detection ability and accuracy of the image, enables analysis regardless of the length of the analysis frame, and enables analysis of continuous behavior.
展开▼