首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Multimodal Classification Fusion in Real-World Scenarios
【24h】

Multimodal Classification Fusion in Real-World Scenarios

机译:现实世界情景中的多模式分类融合

获取原文

摘要

In this paper, we propose a multimodal setting in real-world scenarios based on weighting and meta-learning combination methods that integrate the output probabilities obtained from text and visual classifiers. While the classifier built on the concatenation of text and visual features may worsen the results, the model described in this paper can increase classification accuracy to over 6%. Typically, text or images are used in classification; however, ambiguity in either text or image may reduce the performance. This leads to combine text and image of an object or a concept in a multimodal approach to enhance the performance. In our approach, a text classifier is trained on Bag of Words and a visual classifier is trained on features extracted through a Deep Convolutional Neural Network. We created a new dataset of real-world texts and images called Ferramenta. Some of the images and related texts in this dataset contain ambiguities, which is an ideal situation to test a multimodal approach. Experimental results reported on Ferramenta and PASCAL VOC2007 datasets indicate that the combination methods described performs better in a multimodal setting.
机译:在本文中,我们基于加权和元学习组合方法提出了现实世界场景中的多模态设置,该方法集成了从文本和视觉分类器获得的输出概率。虽然基于文本和可视化功能的串联构建的分类器可能会使结果恶化,但本文中描述的模型可以将分类精度提高到超过6 %。通常,文本或图像用于分类;但是,文本或图像中的模糊可能会降低性能。这导致将对象的文本和图像组合或以多模式方法的概念组合以增强性能。在我们的方法中,文本分类器在单词袋上培训,并在通过深度卷积神经网络提取的功能上培训视觉分类器。我们创建了一个名为ferramenta的现实文本和图像的新数据集。此数据集中的一些图像和相关文本包含含糊不清,这是测试多式联路方法的理想情况。在Ferramera和Pascal VOC2007数据集上报告的实验结果表明描述的组合方法在多模式设置中更好地执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号