首页> 外文会议>International Conference on Image Information Processing >Automatic table detection and retention from scanned document images via analysis of structural information
【24h】

Automatic table detection and retention from scanned document images via analysis of structural information

机译:通过分析结构信息自动从扫描的文档图像中检测并保留表格

获取原文

摘要

The problem of automatic table detection has always been a great topic of debate in the field of Document Analysis and Recognition (DAR). Digital documents are efficient than their printed counterparts for storage, maintenance and republishing. Being a non-textual object of a document, tables prevent OCR system to digitize a document perfectly and distorts layout and structure of digitized documents. There is no available algorithm or method which solves this problem for all possible types of tables. This paper tackles the problem of table detection and retention by proposing a bi-modular approach based on structural information of tables. This structural information includes bounding lines, row/column separators and space between columns. Through analysis of these properties, our experiments on a dataset of above 600 images consisting of more than 829 tables have detected 90% of the table correctly.
机译:在文档分析和识别(DAR)领域,自动表检测问题一直是争论的主题。在存储,维护和重新发布方面,数字文档比印刷文档更有效。作为文档的非文本对象,表格会阻止OCR系统完美地数字化文档并扭曲数字化文档的布局和结构。对于所有可能的表类型,没有可用的算法或方法可以解决此问题。本文提出了一种基于表结构信息的双模方法来解决表检测和保留问题。该结构信息包括边界线,行/列分隔符和列之间的间隔。通过分析这些属性,我们对包含829张以上表格的600幅以上图像的数据集进行的实验正确地检测到90%的表格。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号