【24h】

Document Structure Analysis Using Back-Propagation Network

机译:使用反向传播网络的文档结构分析

获取原文
获取原文并翻译 | 示例

摘要

The digitalization of paper-based documents is an important issue in information era. The digitalization means the process of scanning paper documents, analyzing the layout of document image, and converting it into texts by Optical Character Recognition (OCR) technique. Due to that texts recognized by OCR do not contain structural information, digitized document cannot derive hierarchical structure. In this paper, we propose a document structure analysis technique by means of layout information and neural networks. The objective is to classify blocks in a document, such as title blocks and paragraph blocks, into a structural hierarchy and transfer it into an XML (extensible Markup Language) document automatically. Conference paper is chosen in our experiments. In the experiments, we can achieve 94.44% correctness rate using the back-propagation network (BPN) model. Experimental results validate the feasibility of the proposed scheme.
机译:纸质文件的数字化是信息时代的重要问题。数字化是指扫描纸质文档,分析文档图像的布局并通过光学字符识别(OCR)技术将其转换为文本的过程。由于OCR识别的文本不包含结构信息,因此数字化文档无法得出层次结构。在本文中,我们提出了一种利用布局信息和神经网络的文档结构分析技术。目的是将文档中的块(例如标题块和段落块)分类为结构层次结构,并将其自动转换为XML(可扩展标记语言)文档。在我们的实验中选择了会议论文。在实验中,使用反向传播网络(BPN)模型可以达到94.44%的正确率。实验结果验证了该方案的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号