Deep learning-based according to an aspect of the present invention that includes a plurality of face detection units composed of neural network networks of different depths, and is capable of performing rapid and accurate face recognition by configuring a deep depth while sequentially reducing the number of input images. The face image extraction device of generates a feature map of a plurality of input images using a first neural network network having a first depth, and a plurality of face regions including a face region among the plurality of input images based on the feature map. A first face detection unit that primarily selects the number of first sub-input images; Using a second neural network network having a second depth deeper than the first depth, feature maps of the plurality of first sub-input images are generated, and the plurality of feature maps are generated based on the feature maps generated through the second neural network network. A second face detection unit that secondarily selects a plurality of second sub-input images including a face region among the first sub-input images; And generating feature maps of the plurality of second sub-input images using a third neural network network having a third depth deeper than the second depth, and based on the feature map generated through the third neural network network, the plurality of It characterized in that it comprises a third face detection unit for selecting a face image including a face region among the two second sub-input images.
展开▼