An embodiment of a multimode query answer (mQA) model for answering queries relating to the contents of an image is presented. In an embodiment, the model comprises four members: a short and long term memory (LSTM) member for extracting a query representation, a spiral neural network (CNN) member for extracting a visual representation, and a language context for storing the response. LSTM members, and a fusion member for combining information from the first three members and generating a response. The Freeform Multi-Language Image Query Response (FM-IQA) data set is configured to train and evaluate an embodiment of the mQA model. The quality of the generated response of the mQA model on this data set is evaluated by the Turing test by the human judge.
展开▼