A method and system for training a neural network of a visual recognition computer system, extracts at least one feature of an image or video frame with a feature extractor; approximates the at least one feature of the image or video frame with an auxiliary output provided in the neural network; and measures a feature difference between the extracted at least one feature of the image or video frame and the approximated at least one feature of the image or video frame with an auxiliary error calculator. A joint learner of the method and system adjusts at least one parameter of the neural network to minimize the measured feature difference.
展开▼