首页> 外文会议>IEEE International Conference on Data Mining Workshops >Visual Context Learning with Big Data Analytics
【24h】

Visual Context Learning with Big Data Analytics

机译:利用大数据分析进行视觉上下文学习

获取原文

摘要

Understanding contextual information composed of both text and images is very useful for multimedia information processing. However, capturing such contexts is not trivial, especially while dealing with real datasets. Existing solutions such as using ontologies (e.g., WordNet) are mainly interested in individual terms, but they do not support identifying a group of terms that describe a specific context available at runtime. Within our knowledge, there are very limited solutions regarding the integration of contextual information from both images and text. Furthermore, existing solutions are not scalable due to the computationally intensive tasks and are prone to data sparsity. In this paper, we propose a semantic framework, called VisContextthat is based on a contextual model combined with images and text. The VisContext framework is based on the scalable pipeline that is composed of the primary components as follows: (i)Natural Language Processing (NLP), (ii) Feature extraction usingTerm Frequency-Inverse Document Frequency (TF-IDF), (iii)Feature association using unsupervised learning algorithms including K-Means clustering (KM) and Expectation-Maximization(EM) algorithms, iv) Validation of visual context models using supervised learning algorithms (Naïve Bayes, Decision Trees, Random Forests). The proposed VisContext framework has been implemented with the Spark MLlib and CoreNLP. We have evaluated the effectiveness of the framework in visual understanding with three large datasets (IAPR, Flick3k, SBU) containing more than one million images and their annotations. The results are reported in the discovery of the contextual association of terms and images, image context visualization, and image classification based on contexts.
机译:理解由文本和图像组成的上下文信息对于多媒体信息处理非常有用。但是,捕获此类上下文并非易事,尤其是在处理实际数据集时。诸如使用本体(例如,WordNet)之类的现有解决方案主要对单个术语感兴趣,但是它们不支持识别描述在运行时可用的特定上下文的一组术语。在我们的知识范围内,关于来自图像和文本的上下文信息集成的解决方案非常有限。此外,由于计算量大的任务,现有的解决方案不可扩展,并且易于出现数据稀疏性。在本文中,我们提出了一个名为VisContext的语义框架,该框架基于结合了图像和文本的上下文模型。 VisContext框架基于可扩展管道,该可扩展管道由以下主要组件组成:(i)自然语言处理(NLP),(ii)使用术语频率-逆文档频率(TF-IDF)进行特征提取,(iii)功能使用无监督学习算法(包括K均值聚类(KM)和期望最大化(EM)算法)进行关联,iv)使用监督学习算法(朴素贝叶斯,决策树,随机森林)验证视觉上下文模型。拟议的VisContext框架已通过Spark MLlib和CoreNLP实现。我们使用包含超过一百万张图像及其注释的三个大型数据集(IAPR,Flick3k,SBU)评估了该框架在视觉理解方面的有效性。在术语和图像的上下文关联的发现,图像上下文可视化以及基于上下文的图像分类中报告了结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号