首页> 外文期刊>International Journal of Computer Vision >Combining Multiple Cues for Visual Madlibs Question Answering
【24h】

Combining Multiple Cues for Visual Madlibs Question Answering

机译:组合多个线索对Visual Madlibs问题的回答

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents an approach for answering fill-in-the-blank multiple choice questions from the Visual Madlibs dataset. Instead of generic and commonly used representations trained on the ImageNet classification task, our approach employs a combination of networks trained for specialized tasks such as scene recognition, person activity classification, and attribute prediction. We also present a method for localizing phrases from candidate answers in order to provide spatial support for feature extraction. We map each of these features, together with candidate answers, to a joint embedding space through normalized canonical correlation analysis (nCCA). Finally, we solve an optimization problem to learn to combine scores from nCCA models trained on multiple cues to select the best answer. Extensive experimental results show a significant improvement over the previous state of the art and confirm that answering questions from a wide range of types benefits from examining a variety of image cues and carefully choosing the spatial support for feature extraction.
机译:本文提出了一种回答Visual Madlibs数据集的空白填空多项选择题的方法。代替在想象成分类任务上培训的通用和常用表示,我们的方法采用了用于专业任务的网络组合,例如场景识别,人员活动分类和属性预测。我们还提出了一种用于从候选答案本地化短语的方法,以便为特征提取提供空间支持。我们将这些功能中的每一个与候选答案一起映射到通过标准化的规范相关分析(NCCA)的联合嵌入空间。最后,我们解决了一个优化问题,以便学习从多个提示上培训的NCCA模型组合得分以选择最佳答案。广泛的实验结果表明,对现有技术的显着改善,并确认从各种类型的问题回答问题,这些问题受益于检查各种图像提示并仔细选择用于特征提取的空间支持。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号