Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

机译：学习使用多峰完全卷积神经网络从文档中提取语义结构

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.

机译：我们提出了一种端到端，多模式，完全卷积的网络，用于从文档图像中提取语义结构。我们将文档语义结构提取视为逐个像素的分割任务，并提出了一个统一的模型，该模型不仅可以像传统页面分割任务中那样根据其视觉外观对像素进行分类，还可以根据基础文本的内容对其进行分类。此外，我们提出了一种有效的综合文档生成过程，可用于为网络生成预训练数据。一旦对网络进行了大量合成文档的培训，我们就可以使用半监督方法对未标记的真实文档进行网络微调。我们系统地研究了最佳的网络体系结构，并表明我们的多模式方法和综合数据预训练都可以显着提高性能。

著录项

来源
《IEEE Conference on Computer Vision and Pattern Recognition》|2017年|4342-4351|共10页
会议地点
作者
Xiao Yang; Ersin Yumer; Paul Asente; Mike Kraley; Daniel Kifer; C. Lee Giles;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Semantics; Visualization; Decoding; Training; Image reconstruction; Image segmentation;

机译：语义;可视化;解码;训练;图像重建;图像分割;

相似文献

外文文献
中文文献
专利

1. Weakly Supervised Learning with Deep Convolutional Neural Networks for Semantic Segmentation: Understanding Semantic Layout of Images with Minimum Human Supervision [J] . Seunghoon Hong, Suha Kwak, Bohyung Han IEEE Signal Processing Magazine . 2017,第6期

机译：使用深度卷积神经网络进行语义监督的弱监督学习：以最少的人工监督了解图像的语义布局
2. Document classification using convolutional neural networks with small window sizes and latent semantic analysis [J] . Gultepe Eren, Kamkarhaghighi Mehran, Makrehchi Masoud Web Intelligence and Agent Systems . 2020,第3期

机译：使用小窗口尺寸和潜在语义分析的卷积神经网络进行文档分类
3. Brain Tumor Segmentation Basedon Features Extracted From MRI Multimodal Images Using Deep Convolution NeuralNetworks [J] . Zhang B., Lin H., Xue Z., Medical Physics . 2019,第6期

机译：使用深卷积神经网络从MRI多模式图像提取的脑肿瘤分割的特征
4. Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks [C] . Xiao Yang, Ersin Yumer, Paul Asente, IEEE Conference on Computer Vision and Pattern Recognition . 2017

机译：使用多模式全卷积神经网络学习从文档中提取语义结构
5. Multimodal Data Fusion and Feature Visualization in Convolutional Neural Networks [D] . Punjabi, Arjun Naresh. 2020

机译：卷积神经网络中的多模式数据融合与特征可视化
6. Multimodal MRI-based classification of migraine: using deep learning convolutional neural network [O] . Hao Yang, Junran Zhang, Qihong Liu, 2018

机译：基于多模式MRI的偏头痛分类：使用深度学习卷积神经网络
7. Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks [O] . Arindam Das, Saikat Roy, Ujjwal Bhattacharya, 2018

机译：文档图像分类与域内传输学习和深卷积神经网络的堆叠概括

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅