首页> 外文会议>IAPR International Workshop on Graphics Recognition >Multimodal Classification of Document Embedded Images
【24h】

Multimodal Classification of Document Embedded Images

机译:文档嵌入式图像的多模式分类

获取原文

摘要

Images embedded in documents carry extremely rich information that is vital in its content extraction and knowledge construction. Interpreting the information in diagrams, scanned tables and other types of images, enriches the underlying concepts, but requires a classifier that can recognize the huge variability of potential embedded image types and enable their relationship reconstruction. Here we tested different deep learning-based approaches for image classification on a dataset of 32K images extracted from documents and divided in 62 categories for which we obtain accuracy of ~ 85%. We also investigate to what extent textual information improves classification performance when combined with visual features. The textual features were obtained either from text embedded in the images or image captions. Our findings suggest that textual information carry relevant information with respect to the image category and that multimodal classification provides up to 7% better accuracy than single data type classification.
机译:嵌入在文件中的图像携带极其丰富的信息,这在其内容提取和知识建设中至关重要。解释图表中的信息,扫描表和其他类型的图像,丰富了底层概念,但需要一个分类器,可以识别潜在嵌入图像类型的巨大变化并启用其关系重建。在这里,我们在从文档中提取的32K图像的数据集上测试了基于深度学习的基于深度学习的方法,并分为62类,我们获得了〜85%的准确性。我们还调查文本信息在与可视功能结合时提高分类性能的程度。从图像或图像标题中嵌入的文本获得了文本功能。我们的研究结果表明,文本信息携带关于图像类别的相关信息,多模式分类提供高达单一数据类型分类的更好的精度更好7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号