Deep learning based on one aspect of the present invention, which includes a plurality of face detection units composed of neural network networks of different depths, and sequentially reduces the number of input images to deeply configure depth to perform fast and accurate face recognition. The facial image extracting apparatus generates a feature map of a plurality of input images using a first neural network having a first depth, and a plurality of face regions among the plurality of input images based on the feature map A first face detection unit that primarily selects the first first sub-input images; A feature map of the plurality of first sub-input images is generated using a second neural network network having a second depth deeper than the first depth, and the plurality of features are generated based on the feature map generated through the second neural network. A second face detection unit for secondarily selecting a plurality of second sub input images including a face region among the first sub input images; And a feature map of the plurality of second sub-input images using a third neural network network having a third depth deeper than the second depth, and based on the feature map generated through the third neural network. And a third face detection unit that selects a face image including a face region among the second sub-input images.
展开▼