首页> 外文期刊>Computer vision and image understanding >Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme
【24h】

Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme

机译:使用文本概念直方图和选择性加权后期融合方案对视觉概念进行多模式识别

获取原文
获取原文并翻译 | 示例

摘要

The text associated with images provides valuable semantic meanings about image content that can hardly be described by low-level visual features. In this paper, we propose a novel multimodal approach to automatically predict the visual concepts of images through an effective fusion of textual features along with visual ones. In contrast to the classical Bag-of-Words approach which simply relies on term frequencies, we propose a novel textual descriptor, namely the Histogram of Textual Concepts (HTC), which accounts for the relatedness of semantic concepts in accumulating the contributions of words from the image caption toward a dictionary. In addition to the popular SIFT-like features, we also evaluate a set of mid-level visual features, aiming at characterizing the harmony, dynamism and aesthetic quality of visual content, in relationship with affective concepts. Finally, a novel selective weighted late fusion (SWLF) scheme is proposed to automatically select and weight the scores from the best features according to the concept to be classified. This scheme proves particularly useful for the image annotation task with a multi-label scenario. Extensive experiments were carried out on the MIR FLICKR image collection within the ImageCLEF 2011 photo annotation challenge. Our best model, which is a late fusion of textual and visual features, achieved a MiAP (Mean interpolated Average Precision) of 43.69% and ranked 2nd out of 79 runs. We also provide comprehensive analysis of the experimental results and give some insights for future improvements.
机译:与图像相关联的文本提供了有关图像内容的有价值的语义含义,这些含义很难用低级视觉特征来描述。在本文中,我们提出了一种新颖的多模式方法,通过有效融合文本特征和视觉特征来自动预测图像的视觉概念。与仅依靠词频的经典词袋方法相反,我们提出了一种新颖的文本描述符,即文本概念直方图(HTC),它解释了语义概念在累积词的贡献方面的相关性。朝向字典的图像标题。除了流行的类似于SIFT的功能,我们还评估了一组中级视觉功能,旨在与情感概念相关地表征视觉内容的和谐,动态和美学质量。最后,提出了一种新颖的选择性加权后期融合(SWLF)方案,根据要分类的概念自动从最佳特征中选择分数并对其进行加权。事实证明,该方案对于具有多标签方案的图像注释任务特别有用。在ImageCLEF 2011照片注释挑战中对MIR FLICKR图像集进行了广泛的实验。我们最好的模型是文本和视觉功能的后期融合,其MiAP(均值插值平均精度)达到43.69%,在79次运行中排名第二。我们还提供了对实验结果的综合分析,并为以后的改进提供了一些见识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号