首页> 外文会议>World multiconference on systemics, cybernetics and informatics >Document Structure Analysis Using Back-Propagation Network
【24h】

Document Structure Analysis Using Back-Propagation Network

机译:使用反向传播网络的文档结构分析

获取原文

摘要

The digitalization of paper-based documents is an important issue in information era. The digitalization means the process of scanning paper documents, analyzing the layout of document image, and converting it into texts by Optical Character Recognition (OCR) technique. Due to that texts recognized by OCR do not contain structural information, digitized document cannot derive hierarchical structure. In this paper, we propose a document structure analysis technique by means of layout information and neural networks. The objective is to classify blocks in a document, such as title blocks and paragraph blocks, into a structural hierarchy and transfer it into an XML (extensible Markup Language) document automatically. Conference paper is chosen in our experiments. In the experiments, we can achieve 94.44% correctness rate using the back-propagation network (BPN) model. Experimental results validate the feasibility of the proposed scheme.
机译:基于纸张文档的数字化是信息时代的重要问题。数字化意味着扫描纸张文档的过程,分析文档图像的布局,并通过光学字符识别(OCR)技术将其转换为文本。由于OCR识别的文本不包含结构信息,数字化文档无法导出分层结构。在本文中,我们通过布局信息和神经网络提出了一种文献结构分析技术。目标是将文档中的块分类为标题块和段落块,进入结构层次结构,并自动将其转换为XML(可扩展标记语言)文档。在我们的实验中选择了会议纸张。在实验中,我们可以使用背部传播网络(BPN)模型来达到94.44%的正确性率。实验结果验证了拟议方案的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号