首页> 外文期刊>IEEE transactions on multimedia >Deep0Tag: Deep Multiple Instance Learning for Zero-Shot Image Tagging
【24h】

Deep0Tag: Deep Multiple Instance Learning for Zero-Shot Image Tagging

机译:Deep0Tag:用于零镜头图像标记的深度多实例学习

获取原文
获取原文并翻译 | 示例

摘要

Zero-shot learning aims to perform visual reasoning about unseen objects. In-line with the success of deep learning on object recognition problems, several end-to-end deep models for zero-shot recognition have been proposed in the literature. These models are successful in predicting a single unseen label given an input image but do not scale to cases where multiple unseen objects are present. Here, we focus on the challenging problem of zero-shot image tagging, where multiple labels are assigned to an image, that may relate to objects, attributes, actions, events, and scene type. Discovery of these scene concepts requires the ability to process multi-scale information. To encompass global as well as local image details, we propose an automatic approach to locate relevant image patches and model image tagging within the Multiple Instance Learning (MIL) framework. To the best of our knowledge, we propose the first end-to-end trainable deep MIL framework for the multi-label zero-shot tagging problem. We explore several alternatives for instance-level evidence aggregation and perform an extensive ablation study to identify the optimal pooling strategy. Due to its novel design, the proposed framework has several interesting features: 1) unlike previous deep MIL models, it does not use any off-line procedure (e.g., Selective Search or EdgeBoxes) for bag generation. 2) During test time, it can process any number of unseen labels given their semantic embedding vectors. 3) Using only image-level seen labels as weak annotation, it can produce a localized bounding box for each predicted label. We experiment with the large-scale NUS-WIDE and MS-COCO datasets and achieve superior performance across conventional, zero-shot, and generalized zero-shot tagging tasks.
机译:零镜头学习旨在对看不见的物体进行视觉推理。与深度学习在对象识别问题上取得的成功相一致,文献中提出了几种零击识别的端到端深度模型。这些模型可以成功地预测给定输入图像的单个看不见的标签,但不能缩放到存在多个看不见的物体的情况。在这里,我们重点关注零镜头图像标记这一具有挑战性的问题,在该图像中,为图像分配了多个标签,这可能与对象,属性,动作,事件和场景类型有关。这些场景概念的发现需要能够处理多尺度信息。为了涵盖全局和局部图像详细信息,我们提出了一种自动方法来在多实例学习(MIL)框架内定位相关图像补丁和模型图像标记。据我们所知,我们提出了第一个端到端可训练的深度MIL框架,用于解决多标签零击标记问题。我们探索了一些实例级证据汇总的替代方法,并进行了广泛的消融研究,以确定最佳的合并策略。由于其新颖的设计,提出的框架具有几个有趣的功能:1)与以前的深度MIL模型不同,它不使用任何离线过程(例如,选择性搜索或EdgeBox)来生成袋子。 2)在测试期间,给定它们的语义嵌入向量,它可以处理任意数量的看不见的标签。 3)仅使用图像级别的可见标签作为弱注释,它可以为每个预测的标签生成一个局部边界框。我们对大型NUS-WIDE和MS-COCO数据集进行了实验,并在常规,零镜头和广义零镜头标记任务中实现了卓越的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号