A visual target tracking method and apparatus based on deep adversarial training. The method includes: dividing each video frame of video data into several search regions; for each of the search regions, inputting a target template and the search region into a response graph regression network, and outputting a response graph corresponding to a target; for each of the search regions, inputting the target template, the search region, and the response graph into a discrimination network, and outputting a score of the search region; and using positioning information corresponding to a search region with the highest score as positioning information of the target in the video frame. The method can track a target by constructing a plurality of search regions, and can effectively track the target having a change in length-width ratio. End-to-end processing can be achieved by combining the response graph regression network with the discrimination network.
展开▼