Embodiments of this application disclose an image description generation method performed at a computing device. The method includes: obtaining a target image; generating a first global feature vector and a first label vector set of the target image; applying the target image to a matching model and generating a first multi-mode feature vector of the target image through the matching model, the matching model being a model obtained through training according to a training image and reference image description information of the training image; and generating target image description information of the target image according to the first multi-mode feature vector, the first global feature vector, and the first label vector set.
展开▼