首页> 外文学位 >Towards Supporting Visual Question and Answering Applications
【24h】

Towards Supporting Visual Question and Answering Applications

机译:走向支持视觉问题和回答应用程序

获取原文
获取原文并翻译 | 示例

摘要

Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and to generate a textual answer to the input question. There are two key research problems in VQA: image understanding and the question answering. My research mainly focuses on developing solutions to support solving these two problems.;In image understanding, one important research area is semantic segmentation, which takes images as input and output the label of each pixel. As much manual work is needed to label a useful training set, typical training sets for such supervised approaches are always small. There are also approaches with relaxed labeling requirement, called weakly supervised semantic segmentation, where only image-level labels are needed. With the development of social media, there are more and more user-uploaded images available on-line. Such user-generated content often comes with labels like tags and may be coarsely labelled by various tools. To use these information for computer vision tasks, I propose a new graphic model by considering the neighborhood information and their interactions to obtain the pixel-level labels of the images with only incomplete image-level labels. The method was evaluated on both synthetic and real images.;In question answering, my research centers on best answer prediction, which addressed two main research topics: feature design and model construction. In the feature design part, most existing work discussed how to design effective features for answer quality / best answer prediction. However, little work mentioned how to design features by considering the relationship between answers of one given question. To fill this research gap, I designed new features to help improve the prediction performance. In the modeling part, to employ the structure of the feature space, I proposed an innovative learning-to-rank model by considering the hierarchical lasso. Experiments with comparison with the state-of-the-art in the best answer prediction literature have confirmed that the proposed methods are effective and suitable for solving the research task.
机译:视觉问答(VQA)是一个新的研究领域,涉及的技术从计算机视觉,自然语言处理到人工智能的其他子领域(如知识表示)。基本任务是将一张图像和与给定图像有关的一个问题(以文本形式)作为输入,并生成输入问题的文本答案。 VQA中有两个关键的研究问题:图像理解和问题解答。我的研究主要集中在开发解决方案以支持解决这两个问题。在图像理解中,语义分割是一个重要的研究领域,它以图像为输入和输出每个像素的标签。要标记有用的培训集需要大量的人工工作,因此此类监督方法的典型培训集总是很少。还有一些具有宽松标签要求的方法,称为弱监督语义分割,其中仅需要图像级标签。随着社交媒体的发展,在线上有越来越多的用户上传图像。这种用户生成的内容通常带有标签之类的标签,并可能被各种工具粗略地标记。为了将这些信息用于计算机视觉任务,我提出了一种新的图形模型,其中考虑了邻域信息及其相互作用,以仅具有不完整的图像级标签来获得图像的像素级标签。该方法在合成图像和真实图像上都进行了评估。在功能设计部分,大多数现有工作讨论了如何设计有效功能以实现答案质量/最佳答案预测。但是,很少有工作提到如何通过考虑一个给定问题的答案之间的关系来设计特征。为了填补这一研究空白,我设计了新功能来帮助改善预测性能。在建模部分,为了利用特征空间的结构,我考虑了分层套索,提出了一种创新的学习排名模型。通过与最佳答案预测文献中的最新技术进行比较的实验已经证实,所提出的方法是有效的,适合解决研究任务。

著录项

  • 作者

    Tian, Qiongjie.;

  • 作者单位

    Arizona State University.;

  • 授予单位 Arizona State University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 130 p.
  • 总页数 130
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:38:52

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号