According to the present exemplary embodiment, by adding a plurality of region maps to the visual question answer model, the selected regions and regions selected in the process of inferring the correct answer as well as the correct answer are output, and the plurality of region features are combined to the sentence generation model. The present invention provides a visual query answering apparatus and method capable of outputting a descriptive sentence about an object other than the correct answer with an image.
展开▼