首页> 外文会议>Annual conference on Neural Information Processing Systems >DeViSE: A Deep Visual-Semantic Embedding Model
【24h】

DeViSE: A Deep Visual-Semantic Embedding Model

机译:设计:深度视觉语义嵌入模型

获取原文

摘要

Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources - such as text data - both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet object recognition challenge while making more semantically reasonable errors, and also show that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Semantic knowledge improves such zero-shot predictions achieving hit rates of up to 18% across thousands of novel labels never seen by the visual model.
机译:现代视觉识别系统通常受到扩展到大量对象类别的能力。这种限制部分是由于由于对象类别的数量增加了标记图像的形式获得了足够的训练数据的难度增加。一个补救措施是利用来自其他来源的数据 - 例如文本数据 - 既可以训练视觉模型并限制他们的预测。在本文中,我们介绍了一个新的深度视觉语义嵌入模型,用于使用标记的图像数据以及从未定位的文本收集的语义信息来识别视觉对象。我们展示了该模型在制作更多语义上合理的错误的同时对1000级想象群体识别挑战匹配的最先进的性能,并且还表明可以利用语义信息来使预测大约成千上万的图像标签在训练期间未观察到。语义知识提高了在视觉模型从未见过的数千个新颖的标签上实现零击预测的零点预测最多可达18%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号