DeViSE: A Deep Visual-Semantic Embedding Model

机译：设计：深度视觉语义嵌入模型

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources - such as text data - both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet object recognition challenge while making more semantically reasonable errors, and also show that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Semantic knowledge improves such zero-shot predictions achieving hit rates of up to 18% across thousands of novel labels never seen by the visual model.

机译：现代视觉识别系统通常受到扩展到大量对象类别的能力。这种限制部分是由于由于对象类别的数量增加了标记图像的形式获得了足够的训练数据的难度增加。一个补救措施是利用来自其他来源的数据 - 例如文本数据 - 既可以训练视觉模型并限制他们的预测。在本文中，我们介绍了一个新的深度视觉语义嵌入模型，用于使用标记的图像数据以及从未定位的文本收集的语义信息来识别视觉对象。我们展示了该模型在制作更多语义上合理的错误的同时对1000级想象群体识别挑战匹配的最先进的性能，并且还表明可以利用语义信息来使预测大约成千上万的图像标签在训练期间未观察到。语义知识提高了在视觉模型从未见过的数千个新颖的标签上实现零击预测的零点预测最多可达18％。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2013年||共9页
会议地点
作者
Andrea Frame; Greg S. Corrado; Jonathon Shlens; Samy Bengio; Jeffrey Dean; MarcAurelio Ranzato; Tomas Mikolov;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. A deep person re-identification model with multi visual-semantic information embedding [J] . Wang Xiaopei, Liu Xiaoxia, Guo Jun, Multimedia Tools and Applications . 2021,第5期

机译：具有多视觉语义信息嵌入的深层重新识别模型
2. Target-Oriented Deformation of Visual-Semantic Embedding Space [J] . Takashi MATSUBARA IEICE transactions on information and systems . 2021,第1期

机译：目标导向的视觉嵌入空间变形
3. Spatiotemporal visual-semantic embedding network for zero-shot action recognition [J] . An Rongqiao, Miao Zhenjiang, Li Qingyu, Journal of electronic imaging . 2019,第2期

机译：零时空动作识别的时空视觉语义嵌入网络
4. DeViSE: A Deep Visual-Semantic Embedding Model [C] . Andrea Frame, Greg S. Corrado, Jonathon Shlens, Annual conference on Neural Information Processing Systems . 2013

机译：DeViSE：深度视觉语义嵌入模型
5. Learning Robust Visual-Semantic Retrieval Models with Limited Supervision [D] . Mithun, Niluthpol Chowdhury. 2019

机译：学习强大的视觉语义检索模型，监督有限
6. Prediction of Drug–Target Interactions From Multi-Molecular Network Based on Deep Walk Embedding Model [O] . Zhan-Heng Chen, Zhu-Hong You, Zhen-Hao Guo, 2020

机译：基于深度步行嵌入模型的多分子网络中药物 - 目标相互作用的预测
7. Target-Oriented Deformation of Visual-Semantic Embedding Space [O] . Takashi MATSUBARA 2021

机译：目标导向的视觉嵌入空间变形

DeViSE: A Deep Visual-Semantic Embedding Model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅