Provided is an image content automatic description method based on the construction of a Chinese visual vocabulary list. The method comprises steps performed in order: step a, using a Chinese word segmentation tool to perform word segmentation processing on several descriptive sentences corresponding to a single picture, selectively reserving nouns, verbs and adjectives in a word list according to statistical word frequencies, and then using the reserved words to form a Chinese visual vocabulary list; step b, carrying out prediction on the Chinese visual vocabulary list on the basis of a Chinese vocabulary list prediction network, to obtain image label information; and step c, on the basis of an automatic image description model, using an encoder to extract image convolutional features, and then using a decoder to decode the image convolutional features, as an initial input, into a Chinese descriptive statement. Image label information can be obtained by carrying out prediction on an image vocabulary list on the basis of a vocabulary list prediction network, and a residual structure is added to a Chinese visual vocabulary list prediction network, such that the problem of network degradation along with an increase in the number of layers of a Chinese visual vocabulary list prediction network can be effectively solved.
展开▼