首页>
外国专利>
A METHOD FOR TRAINING A CONVOLUTIONAL NEURAL NETWORK FOR IMAGE RECOGNITION USING IMAGE-CONDITIONED MASKED LANGUAGE MODELING
A METHOD FOR TRAINING A CONVOLUTIONAL NEURAL NETWORK FOR IMAGE RECOGNITION USING IMAGE-CONDITIONED MASKED LANGUAGE MODELING
展开▼
机译:一种使用图像调节屏蔽语言建模训练用于图像识别的卷积神经网络的方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method and system pre-trains a convolutional neural network for image recognition based upon masked language modeling by inputting, to the convolutional neural network, an image; outputting, from the convolutional neural network, a visual embedding tensor of visual embedding vectors; tokenizing a caption to create a list of tokens, at least one token having visual correspondence to the image received by the convolutional neural network; randomly selecting one of the tokens in the list of tokens to be masked, the selected token being taken as ground truth; computing, using a language model neural network, hidden representations of the tokens; using the hidden representation of the masked token, as a query vector, to pool the visual embedding vectors in the visual embedding tensor, attentively; predicting the masked token by mapping the pooled visual embedding vectors to the tokens; determining a prediction loss associated with the masked token; and back-propagating the prediction loss to the convolutional neural network to tune parameters thereof.
展开▼