VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes

机译：VisualWord2Vec（Vis-W2V）：使用抽象场景学习基于视觉的单词嵌入

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a model to learn visually grounded word embeddings (vis-w2v) to capture visual notions of semantic relatedness. While word embeddings trained using text have been extremely successful, they cannot uncover notions of semantic relatedness implicit in our visual world. For instance, although "eats" and "stares at" seem unrelated in text, they share semantics visually. When people are eating something, they also tend to stare at the food. Grounding diverse relations like "eats" and "stares at" into vision remains challenging, despite recent progress in vision. We note that the visual grounding of words depends on semantics, and not the literal pixels. We thus use abstract scenes created from clipart to provide the visual grounding. We find that the embeddings we learn capture fine-grained, visually grounded notions of semantic relatedness. We show improvements over text-only word embeddings (word2vec) on three tasks: common-sense assertion classification, visual paraphrasing and text-based image retrieval. Our code and datasets are available online.

机译：我们提出了一个模型来学习视觉基础的单词嵌入（vis-w2v），以捕获语义相关性的视觉概念。尽管使用文本训练的词嵌入非常成功，但它们无法揭示我们视觉世界中隐含的语义相关性概念。例如，尽管“吃”和“凝视”在文本中似乎无关，但它们在视觉上共享语义。人们在吃东西时，也会倾向于盯着食物。尽管最近在视觉方面取得了进展，但将诸如“进食”和“凝视”之类的多种关系扎根于视觉仍然具有挑战性。我们注意到，单词的视觉基础取决于语义，而不是文字像素。因此，我们使用从剪贴画创建的抽象场景来提供视觉基础。我们发现，我们学习的嵌入捕获了语义相关性的细粒度，基于视觉的概念。我们在以下三个任务上显示了纯文本词嵌入（word2vec）的改进：常识断言分类，视觉释义和基于文本的图像检索。我们的代码和数据集可在线获得。

著录项

来源
《IEEE Conference on Computer Vision and Pattern Recognition》|2016年|4985-4994|共10页
会议地点
作者
Satwik Kottur; Ramakrishna Vedantam; José M. F. Moura; Devi Parikh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Visualization; Semantics; Grounding; Context; Context modeling; Cognition; Training;

机译：可视化;语义;基础;上下文;上下文建模;认知;培训;

相似文献

外文文献
中文文献
专利

1. Learning multi-prototype word embedding from single-prototype word embedding with integrated knowledge [J] . Yang Xuefeng, Mao Kezhi Expert Systems with Application . 2016,第Sepa期

机译：从具有集成知识的单原型词嵌入中学习多原型词嵌入
2. Learning words and rules: Abstract knowledge of word order in early sentence comprehension [J] . Gertner Y, Fisher C, Eisengart J Psychological science: a journal of the American Psychological Society . 2006,第8期

机译：学习单词和规则：早期句子理解中单词顺序的抽象知识
3. Scene Recognition by Joint Learning of DNN from Bag of Visual Words and Convolutional DCT Features [J] . Rehman Abdul, Saleem Summra, Khan Usman Ghani, Applied Artificial Intelligence . 2021,第9a11期

机译：从视觉单词袋和卷积DCT功能的DNN联合学习的场景识别
4. VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes [C] . Satwik Kottur, Ramakrishna Vedantam, José M. F. Moura, IEEE Conference on Computer Vision and Pattern Recognition . 2016

机译：VisualWord2vec（Vis-W2V）：使用抽象场景学习视觉上的Word Embedings
5. Learning by doing something else: A grounded theory study of "embedded learning" [D] . Wallace, Mark Edward. 2009

机译：通过做别的事情学习：“嵌入式学习”的扎根理论研究
6. Improving the learning of chemical-protein interactions from literature using transfer learning and specialized word embeddings [O] . P Corbett, J Boyle 2018

机译：使用转移学习和专门的词嵌入来改善文学中化学-蛋白质相互作用的学习
7. Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes [O] . Kottur, Satwik, Vedantam, Ramakrishna, Moura, José M. F., 2016

机译：Visual Word2Vec（vis-w2v）：学习视觉接地的Word嵌入使用抽象场景

VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes

摘要

著录项

相似文献

相关主题

期刊订阅