首页> 外文期刊>Information retrieval >Multimodal indexing based on semantic cohesion for image retrieval
【24h】

Multimodal indexing based on semantic cohesion for image retrieval

机译:基于语义内聚的多峰索引用于图像检索

获取原文
获取原文并翻译 | 示例
           

摘要

This paper introduces two novel strategies for representing multimodal images with application to multimedia image retrieval. We consider images that are composed of both text and labels: while text describes the image content at a very high semantic level (e.g., making reference to places, dates or events), labels provide a mid-level description of the image (i.e., in terms of the objects that can be seen in the image). Accordingly, the main assumption of this work is that by combining information from text and labels we can develop very effective retrieval methods. We study standard information fusion techniques for combining both sources of information. However, whereas the performance of such techniques is highly competitive, they cannot capture effectively the content of images. Therefore, we propose two novel representations for multimodal images that attempt to exploit the semantic cohesion among terms from different modalities. Such representations are based on distributional term representations widely used in computational linguistics. Under the considered representations the content of an image is modeled by a distribution of co-occurrences over terms or of occurrences over other images, in such a way that the representation can be considered an expansion of the multimodal terms in the image. We report experimental results using the SAIAPR TCI2 benchmark on two sets of topics used in ImageCLEF competitions with manually and automatically generated labels. Experimental results show that the proposed representations outperform significantly both, standard multimodal techniques and unimodal methods. Results on manually assigned labels provide an upper bound in the retrieval performance that can be obtained, whereas results with automatically generated labels are encouraging. The novel representations are able to capture more effectively the content of multimodal images. We emphasize that although we have applied our representations to multimedia image retrieval the same formulation can be adopted for modeling other multimodal documents (e.g., videos).
机译:本文介绍了两种用于表示多峰图像的新颖策略,并将其应用于多媒体图像检索。我们考虑由文字和标签组成的图片:尽管文字在非常高的语义级别(例如,引用地点,日期或事件)描述图片内容,但是标签提供了图片的中级描述(即,就可以在图像中看到的对象而言)。因此,这项工作的主要假设是,通过结合来自文本和标签的信息,我们可以开发出非常有效的检索方法。我们研究用于合并两种信息源的标准信息融合技术。但是,尽管此类技术的性能具有很高的竞争力,但它们无法有效捕获图像的内容。因此,我们为多模式图像提出了两种新颖的表示形式,它们试图利用不同模式的术语之间的语义衔接。这样的表示是基于在计算语言学中广泛使用的分布项表示。在所考虑的表示之下,图像的内容是通过在词项上共现或在其他图像上发生的事件的分布来建模的,这样就可以将表示形式视为图像中多峰项的扩展。我们使用SAIAPR TCI2基准报告了针对ImageCLEF竞赛中使用的两组主题的实验结果,这些主题具有手动和自动生成的标签。实验结果表明,所提出的表示形式均明显优于标准多峰技术和单峰方法。手动分配标签的结果提供了可以获取的检索性能的上限,而自动生成标签的结果令人鼓舞。新颖的表示法能够更有效地捕获多模式图像的内容。我们强调,尽管我们已将表示形式应用于多媒体图像检索,但是可以采用相同的公式来建模其他多峰文档(例如视频)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号