This paper proposes collaborative and context-aware visual question answering (C2VQA) for multi-modal information channels integration, and details its particular mapping and realization for biometrics forensic integration (BFI) using Show and Tell like architectures. C2VQA, which expands on Visual Query Answering (VQA) and the Visual Turing Test (VIT), engages deep semantic alignment and joint embedding using deep learning (DL) for image analysis, vector space as skip-grams and long-term dependencies as gated recurrent networks for context prediction, and multi-strategy learning including conformal prediction for control and meta-reasoning. C2VQA would engage in purposeful dialog to address and correct for misinformation and uncertainty and considers behavior to model realistic VQA problems characteristic of open rather than closed set VQA. (C) 2017 Elsevier B.V. All rights reserved.
展开▼