Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval

Cuicui Kang; Shiming Xiang; Shengcai Liao; Changsheng Xu; Chunhong Pan

首页> 外文期刊>Multimedia, IEEE Transactions on >Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval

【24h】

Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval

机译：跨模态多媒体检索的学习一致特征表示

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The cross-modal feature matching has gained much attention in recent years, which has many practical applications, such as the text-to-image retrieval. The most difficult problem of cross-modal matching is how to eliminate the heterogeneity between modalities. The existing methods (e.g., CCA and PLS) try to learn a common latent subspace, where the heterogeneity between two modalities is minimized so that cross-matching is possible. However, most of these methods require fully paired samples and suffer difficulties when dealing with unpaired data. Besides, utilizing the class label information has been found as a good way to reduce the semantic gap between the low-level image features and high-level document descriptions. Considering this, we propose a novel and effective supervised algorithm, which can also deal with the unpaired data. In the proposed formulation, the basis matrices of different modalities are jointly learned based on the training samples. Moreover, a local group-based priori is proposed in the formulation to make a better use of popular block based features (e.g., HOG and GIST). Extensive experiments are conducted on four public databases: Pascal VOC2007, LabelMe, Wikipedia, and NUS-WIDE. We also evaluated the proposed algorithm with unpaired data. By comparing with existing state-of-the-art algorithms, the results show that the proposed algorithm is more robust and achieves the best performance, which outperforms the second best algorithm by about 5% on both the Pascal VOC2007 and NUS-WIDE databases.

机译：跨模式特征匹配近年来受到了广泛的关注，其具有许多实际应用，例如文本到图像的检索。跨模态匹配最困难的问题是如何消除模态之间的异质性。现有方法（例如CCA和PLS）试图学习一个共同的潜在子空间，其中两个模态之间的异质性被最小化，从而可以进行交叉匹配。但是，这些方法大多数都需要完全配对的样本，并且在处理未配对的数据时会遇到困难。此外，已经发现利用类别标签信息是减小低级图像特征和高级文档描述之间的语义鸿沟的一种好方法。考虑到这一点，我们提出了一种新颖有效的监督算法，该算法也可以处理未配对的数据。在提出的公式中，基于训练样本共同学习了不同模态的基础矩阵。此外，在配方中提出了基于局部组的先验，以更好地利用流行的基于块的特征（例如，HOG和GIST）。在四个公共数据库上进行了广泛的实验：Pascal VOC2007，LabelMe，Wikipedia和NUS-WIDE。我们还用不成对的数据评估了提出的算法。通过与现有的最新算法进行比较，结果表明，所提出的算法更加健壮，并实现了最佳性能，在Pascal VOC2007和NUS-WIDE数据库上均比次优算法高出约5％。

著录项

来源
《Multimedia, IEEE Transactions on》 |2015年第3期|370-381|共12页
作者
Cuicui Kang; Shiming Xiang; Shengcai Liao; Changsheng Xu; Chunhong Pan;
展开▼
作者单位

Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
feature extraction; image matching; image representation; image retrieval; learning (artificial intelligence); LabelMe database; NUS-WIDE database; Pascal VOC2007 database; Wikipedia database; block based features; class label information; cross-modal feature matching; cross-modal multimedia retrieval; feature representation learning; high-level document description; latent subspace learning; local group-based priori; low-level image features; modality heterogeneity; supervised learning algorithm; text-to-image retrieval; Algorithm design and analysis; Correlation; Face recognition; Multimedia communication; Semantics; Training; Vectors; Cross-modal matching; documents and images; multimedia; retrieval;

机译：特征提取;图像匹配;图像表示;图像检索;学习（人工智能）;LabelMe数据库;NUS-WIDE数据库;Pascal VOC2007数据库;Wikipedia数据库;基于块的特征;类标签信息;跨模式特征匹配;跨模式多媒体检索;特征表示学习;高级文档描述;潜在子空间学习;基于局部组的先验;低级图像特征;模态异质性;监督学习算法;文本到图像检索;算法设计和分析;相关性;人脸识别;多媒体通信;语义;训练;矢量;跨模态匹配;文档和图像;多媒体;检索;

相似文献

外文文献
中文文献
专利

1. Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval [J] . Xu Yuan, Hua Zhong, Zhikui Chen, International journal of grid and high performance computing . 2018,第3期

机译：跨模态检索的多媒体特征映射和相关学习
2. Towards learning a semantic-consistent subspace for cross-modal retrieval [J] . Xu Meixiang, Zhu Zhenfeng, Zhao Yao Multimedia Tools and Applications . 2019,第1期

机译：旨在学习用于跨模式检索的语义一致子空间
3. Prototype-Based Discriminative Feature Representation for Class-incremental Cross-modal Retrieval [J] . Zhu Shaoquan, Feng Yong, Zhou Mingliang, International Journal of Pattern Recognition and Artificial Intelligence . 2021,第5期

机译：基于原型的歧视特征表示，用于类增量跨模型检索
4. Deep Learning and Shared Representation Space Learning Based Cross-Modal Multimedia Retrieval [C] . Hui Zou, Ji-Xiang Du, Chuan-Min Zhai, International conference on advanced intelligent computing theories and applications . 2016

机译：基于深度学习和共享表示空间学习的跨模态多媒体检索
5. Semantic-aware data processing: Towards cross-modal multimedia analysis and content-based retrieval in distributed and mobile environments . [D] . Yang, Bo. 2007

机译：语义感知数据处理：在分布式和移动环境中实现跨模式多媒体分析和基于内容的检索。
6. Improvement of deep cross-modal retrieval by generating real-valued representation [O] . Nikita Bhatt, Amit Ganatra 2021

机译：通过生成真实值表示改善深层跨模岩检索
7. Adversarial Learning Based Semantic Correlation Representation for Cross-Modal Retrieval [O] . Lei Zhu, Jiayu Song, Xiangxiang Wei, 2020

机译：基于对跨模型检索的基于对抗基于语义相关表示

Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅