首页> 外文会议>IEEE Conference on Computer Vision and Pattern Recognition >Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks
【24h】

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

机译:学习使用多峰完全卷积神经网络从文档中提取语义结构

获取原文

摘要

We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover, we propose an efficient synthetic document generation process that we use to generate pretraining data for our network. Once the network is trained on a large set of synthetic documents, we fine-tune the network on unlabeled real documents using a semi-supervised approach. We systematically study the optimum network architecture and show that both our multimodal approach and the synthetic data pretraining significantly boost the performance.
机译:我们提出了一种端到端,多模式,完全卷积的网络,用于从文档图像中提取语义结构。我们将文档语义结构提取视为逐个像素的分割任务,并提出了一个统一的模型,该模型不仅可以像传统页面分割任务中那样根据其视觉外观对像素进行分类,还可以根据基础文本的内容对其进行分类。此外,我们提出了一种有效的综合文档生成过程,可用于为网络生成预训练数据。一旦对网络进行了大量合成文档的培训,我们就可以使用半监督方法对未标记的真实文档进行网络微调。我们系统地研究了最佳的网络体系结构,并表明我们的多模式方法和综合数据预训练都可以显着提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号