首页> 外文会议>International Conference on Data Mining >Mining Visual and Textual Data for Constructing a Multi-Modal Thesaurus
【24h】

Mining Visual and Textual Data for Constructing a Multi-Modal Thesaurus

机译:用于构建多模态词库的挖掘视觉和文本数据

获取原文

摘要

We propose an unsupervised approach to learn associations between continuous-valued attributes from different modalities. These associations are used to construct a multi-modal thesaurus that could serve as a foundation for inter-modality translation, and for hybrid navigation and search algorithms. We focus on extracting associations between visual features and textual keywords. Visual features consist of low-level attributes extracted from image content such as color, texture, and shape. Textual features consist of keywords that provide a description of the images. We assume that a collection of training images is available and that each image is globally annotated by few keywords. The objective is to extract representative visual profiles that correspond to frequent homogeneous regions, and to associate them with keywords. These profiles would be used to build the a multimodal thesaurus. The proposed approach was trained with a large collection of images, and the constructed thesaurus was used to label new images. Initial experiments indicate that we can achieve up to 71.9% relative improvement on captioning accuracy over the state-of-the-art.
机译:我们提出了一种无监督的方法来学习来自不同方式的连续属性之间的关联。这些关联用于构建可以作为模态转换的基础和混合导航和搜索算法的多模态词库。我们专注于提取可视化功能和文本关键字之间的关联。视觉功能由从图像内容提取的低级属性组成,例如颜色,纹理和形状。文本功能包括提供图像描述的关键字。我们假设可以使用培训图像的集合,并且每个图像都是少量关键词的全局注释。目的是提取对应于频繁均匀区域的代表性视觉曲线,并将它们与关键字相关联。这些简档将用于构建多式联运词库。所提出的方法培训了大量图像,并且建造的词库用于标记新图像。初始实验表明,在最先进的准确性上,我们可以获得高达71.9%的相对改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号