Towards Supporting Visual Question and Answering Applications

机译：走向支持视觉问题和回答应用程序

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and to generate a textual answer to the input question. There are two key research problems in VQA: image understanding and the question answering. My research mainly focuses on developing solutions to support solving these two problems.;In image understanding, one important research area is semantic segmentation, which takes images as input and output the label of each pixel. As much manual work is needed to label a useful training set, typical training sets for such supervised approaches are always small. There are also approaches with relaxed labeling requirement, called weakly supervised semantic segmentation, where only image-level labels are needed. With the development of social media, there are more and more user-uploaded images available on-line. Such user-generated content often comes with labels like tags and may be coarsely labelled by various tools. To use these information for computer vision tasks, I propose a new graphic model by considering the neighborhood information and their interactions to obtain the pixel-level labels of the images with only incomplete image-level labels. The method was evaluated on both synthetic and real images.;In question answering, my research centers on best answer prediction, which addressed two main research topics: feature design and model construction. In the feature design part, most existing work discussed how to design effective features for answer quality / best answer prediction. However, little work mentioned how to design features by considering the relationship between answers of one given question. To fill this research gap, I designed new features to help improve the prediction performance. In the modeling part, to employ the structure of the feature space, I proposed an innovative learning-to-rank model by considering the hierarchical lasso. Experiments with comparison with the state-of-the-art in the best answer prediction literature have confirmed that the proposed methods are effective and suitable for solving the research task.

机译：视觉问答（VQA）是一个新的研究领域，涉及的技术从计算机视觉，自然语言处理到人工智能的其他子领域（如知识表示）。基本任务是将一张图像和与给定图像有关的一个问题（以文本形式）作为输入，并生成输入问题的文本答案。 VQA中有两个关键的研究问题：图像理解和问题解答。我的研究主要集中在开发解决方案以支持解决这两个问题。在图像理解中，语义分割是一个重要的研究领域，它以图像为输入和输出每个像素的标签。要标记有用的培训集需要大量的人工工作，因此此类监督方法的典型培训集总是很少。还有一些具有宽松标签要求的方法，称为弱监督语义分割，其中仅需要图像级标签。随着社交媒体的发展，在线上有越来越多的用户上传图像。这种用户生成的内容通常带有标签之类的标签，并可能被各种工具粗略地标记。为了将这些信息用于计算机视觉任务，我提出了一种新的图形模型，其中考虑了邻域信息及其相互作用，以仅具有不完整的图像级标签来获得图像的像素级标签。该方法在合成图像和真实图像上都进行了评估。在功能设计部分，大多数现有工作讨论了如何设计有效功能以实现答案质量/最佳答案预测。但是，很少有工作提到如何通过考虑一个给定问题的答案之间的关系来设计特征。为了填补这一研究空白，我设计了新功能来帮助改善预测性能。在建模部分，为了利用特征空间的结构，我考虑了分层套索，提出了一种创新的学习排名模型。通过与最佳答案预测文献中的最新技术进行比较的实验已经证实，所提出的方法是有效的，适合解决研究任务。

著录项

作者
Tian, Qiongjie.;
展开▼
作者单位

Arizona State University.;

展开▼
授予单位 Arizona State University.;
学科 Computer science.
学位 Ph.D.
年度 2017
页码 130 p.
总页数 130
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:38:52

相似文献

外文文献
中文文献
专利

1. Applications support Distributors Questions and Answers [J] . Chris Tassell Electronic engineering . 1996,第834期

机译：应用程序支持发行人问答
2. Applications support Distributors Questions and Answers [J] . Mark Dunnett Electronic engineering . 1996,第835期

机译：应用程序支持发行人问答
3. Applications support Distributors Questions and Answers [J] . Richard Bartlett, Gary Bocock Electronic engineering . 1996,第838期

机译：应用程序支持发行人问答
4. Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder [C] . Gouthaman KV, Anurag Mittal European Conference on Computer Vision . 2020

机译：通过视觉接地问题编码器减少视觉问题中的语言偏见
5. Representation Learning of Data with Multiple Modalities with Applications to Visual Question Answering [D] . Ilievski, Ilija. 2018

机译：表示具有多种模式的数据的学习，以应用程序到视觉问题应答
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder [O] . Gouthaman KV, Anurag Mittal 2020

机译：通过视觉接地问题编码器还原视觉问题中的语言偏见
8. Questions and Answers on Quality, the ISO 9000 Standard Series, Quality SystemRegistration, and Related Issues. More Questions and Answers on the ISO 9000 Standard Series and Related Issues [R] . Breitenberg, M. 1993

机译：有关质量的问题和解答，IsO 9000标准系列，质量体系注册和相关问题。有关IsO 9000标准系列及相关问题的更多问题和解答

Towards Supporting Visual Question and Answering Applications

摘要

著录项

相似文献

相关主题

期刊订阅