Technology is disclosed that improves language processing engines by using multi-media (image, video, etc.) context data when training and applying language models. Multi-media context data can be obtained from one or more sources such as object/location/person identification in the multi-media, multi-media characteristics, labels or characteristics provided by an author of the multi-media, or information about the author of the multi-media. This context data can be used as additional input for a machine learning process that creates a model used in language processing. The resulting model can be used as part of various language processing engines such as a translation engine, correction engine, tagging engine, etc., by taking multi-media context/labeling for a content item as part of the input for computing results of the model.
展开▼