Multimodal Classification of Document Embedded Images

机译：文档嵌入式图像的多模式分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Images embedded in documents carry extremely rich information that is vital in its content extraction and knowledge construction. Interpreting the information in diagrams, scanned tables and other types of images, enriches the underlying concepts, but requires a classifier that can recognize the huge variability of potential embedded image types and enable their relationship reconstruction. Here we tested different deep learning-based approaches for image classification on a dataset of 32K images extracted from documents and divided in 62 categories for which we obtain accuracy of ～ 85%. We also investigate to what extent textual information improves classification performance when combined with visual features. The textual features were obtained either from text embedded in the images or image captions. Our findings suggest that textual information carry relevant information with respect to the image category and that multimodal classification provides up to 7% better accuracy than single data type classification.

机译：嵌入在文件中的图像携带极其丰富的信息，这在其内容提取和知识建设中至关重要。解释图表中的信息，扫描表和其他类型的图像，丰富了底层概念，但需要一个分类器，可以识别潜在嵌入图像类型的巨大变化并启用其关系重建。在这里，我们在从文档中提取的32K图像的数据集上测试了基于深度学习的基于深度学习的方法，并分为62类，我们获得了〜85％的准确性。我们还调查文本信息在与可视功能结合时提高分类性能的程度。从图像或图像标题中嵌入的文本获得了文本功能。我们的研究结果表明，文本信息携带关于图像类别的相关信息，多模式分类提供高达单一数据类型分类的更好的精度更好7％。

著录项

来源
《IAPR International Workshop on Graphics Recognition》|2018年|168p|共9页
会议地点
作者
Matheus Viana; Quoc-Bao Nguyen; John Smith; Maria Gabrani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词

相似文献

外文文献
中文文献
专利

1. Multimodal page classification in administrative document image streams [J] . Mancal Rusinol, Volkmar Frinken, Dimosthenis Karatzas, International Journal on Document Analysis and Recognition . 2014,第4期

机译：行政文档图像流中的多模式页面分类
2. Large-scale document image retrieval and classification with runlength histograms and binary embeddings [J] . Gordo A., Perronnin F., Valveny E. Pattern Recognition: The Journal of the Pattern Recognition Society . 2013,第7期

机译：具有游程直方图和二进制嵌入的大规模文档图像检索和分类
3. Multimodality imaging in takotsubo syndrome: a joint consensus document of the European Association of Cardiovascular Imaging (EACVI) and the Japanese Society of Echocardiography (JSE) [J] . Rodolfo Citro, Hiroyuki Okura, Jelena R Ghadri, Journal of echocardiography . 2020,第4期

机译：Takotsubo综合征中的多模成像：欧洲心血管成像协会（EACVI）和日本超声心动图社会的联合共识文件（JSE）
4. Multimodal Classification of Document Embedded Images [C] . Matheus Viana, Quoc-Bao Nguyen, John Smith, IAPR International Workshop on Graphics Recognition . 2018

机译：文档嵌入式图像的多模式分类
5. A Scalable and Low Power Deep Convolutional Neural Network for Multimodal Data Classification in Embedded Real-Time Systems [D] . Jafari, Ali. 2017

机译：用于嵌入式实时系统中的多模式数据分类的可扩展和低功耗的深卷积神经网络
6. Classification of projection images of crystalline arrays of the mitochondrial voltage-dependent anion-selective channel embedded in aurothioglucose. [O] . X W Guo, C A Mannella 1992

机译：嵌入硫代葡萄糖的线粒体电压依赖性阴离子选择通道的晶体阵列的投影图像分类。
7. Complex Document Classification and Localization Application on Identity Document Images [O] . Awal, Ahmad-Montaser,, Ghanmi, Nabil, Sicre, Ronan, 2017

机译：复杂文件分类与本地化在身份证件图像上的应用

Multimodal Classification of Document Embedded Images

摘要

著录项

相似文献

相关主题

期刊订阅