首页> 外文会议>IEEE Conference on Computer Vision and Pattern Recognition >Aggregating Image and Text Quantized Correlated Components
【24h】

Aggregating Image and Text Quantized Correlated Components

机译:聚集图像和文本量化的相关组件

获取原文

摘要

Cross-modal tasks occur naturally for multimedia content that can be described along two or more modalities like visual content and text. Such tasks require to "translate" information from one modality to another. Methods like kernelized canonical correlation analysis (KCCA) attempt to solve such tasks by finding aligned subspaces in the description spaces of different modalities. Since they favor correlations against modality-specific information, these methods have shown some success in both cross-modal and bi-modal tasks. However, we show that a direct use of the subspace alignment obtained by KCCA only leads to coarse translation abilities. To address this problem, we first put forward a new representation method that aggregates information provided by the projections of both modalities on their aligned subspaces. We further suggest a method relying on neighborhoods in these subspaces to complete uni-modal information. Our proposal exhibits state-of-the-art results for bi-modal classification on Pascal VOC07 and improves it by over 60% for cross-modal retrieval on FlickR 8K/30K.
机译:跨模式任务对于多媒体内容自然会发生,可以通过两种或多种模式(如视觉内容和文本)进行描述。这样的任务需要将信息从一种模态“翻译”到另一种模态。诸如核化规范相关分析(KCCA)之类的方法试图通过在不同模态的描述空间中找到对齐的子空间来解决此类任务。由于它们支持针对特定于模态的信息进行关联,因此这些方法在跨模态和双模态任务中均显示出一定的成功。但是,我们表明,直接使用KCCA获得的子空间对齐方式只会导致粗略的翻译能力。为了解决这个问题,我们首先提出了一种新的表示方法,该方法将两个模态的投影在它们对齐的子空间上的聚集信息汇总在一起。我们进一步建议一种依靠这些子空间中的邻域来完成单峰信息的方法。我们的建议展示了Pascal VOC07上双模式分类的最新结果,并在FlickR 8K / 30K上的跨模式检索中将其改进了60%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号