Automatic table detection and retention from scanned document images via analysis of structural information

机译：通过分析结构信息自动从扫描的文档图像中检测并保留表格

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of automatic table detection has always been a great topic of debate in the field of Document Analysis and Recognition (DAR). Digital documents are efficient than their printed counterparts for storage, maintenance and republishing. Being a non-textual object of a document, tables prevent OCR system to digitize a document perfectly and distorts layout and structure of digitized documents. There is no available algorithm or method which solves this problem for all possible types of tables. This paper tackles the problem of table detection and retention by proposing a bi-modular approach based on structural information of tables. This structural information includes bounding lines, row/column separators and space between columns. Through analysis of these properties, our experiments on a dataset of above 600 images consisting of more than 829 tables have detected 90% of the table correctly.

机译：在文档分析和识别（DAR）领域，自动表检测问题一直是争论的主题。在存储，维护和重新发布方面，数字文档比印刷文档更有效。作为文档的非文本对象，表格会阻止OCR系统完美地数字化文档并扭曲数字化文档的布局和结构。对于所有可能的表类型，没有可用的算法或方法可以解决此问题。本文提出了一种基于表结构信息的双模方法来解决表检测和保留问题。该结构信息包括边界线，行/列分隔符和列之间的间隔。通过分析这些属性，我们对包含829张以上表格的600幅以上图像的数据集进行的实验正确地检测到90％的表格。

著录项

来源
《International Conference on Image Information Processing》|2017年|1-6|共6页
会议地点
作者
Varsha Ranka; Shubham Patil; Shubham Patni; Tushar Raut; Kapil Mehrotra; Manish Kumar Gupta;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Layout; Transforms; Optical character recognition software; Information processing; Text analysis; Particle separators; Histograms;

机译：布局;转换;光学字符识别软件;信息处理;文本分析;粒子分隔符;直方图;

相似文献

外文文献
中文文献
专利

1. Automatic Abstraction of Combinational Logic Circuit from Scanned Document Page Images [J] . Ramanath Datta, Sekhar Mandal, Samit Biswas Pattern recognition and image analysis: advances in mathematical theory and applications in the USSR . 2019,第2期

机译：扫描文档页面图像自动抽象组合逻辑电路
2. Automatic localization and extraction of tables from handheld mobile-camera captured handwritten document images [J] . Amarnath R., Sindhushree G. S., Nagabhushan P., Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2019,第3期

机译：手持式移动摄像机捕获的手写文档图像自动本地化和提取表格
3. Using ImageMagick to Automatically Increase Legibility of Scanned Text Documents. [J] . Belfiore Doreva Code4Lib Journal . 2011,第14期

机译：使用ImageMagick自动增加扫描文本文档的可读性。
4. Automatic table detection and retention from scanned document images via analysis of structural information [C] . Varsha Ranka, Shubham Patil, Shubham Patni, International Conference on Image Information Processing . 2017

机译：通过分析结构信息自动表检测和扫描文档图像的保留
5. Automatic image registration and defect identification of a class of structural artifacts in printed documents [D] . Chandu, Kartheek 2008

机译：自动图像配准和打印文档中一类结构伪影的缺陷识别
6. RAC-CNN: multimodal deep learning based automatic detection and classification of rod and cone photoreceptors in adaptive optics scanning light ophthalmoscope images [O] . David Cunefare, Alison L. Huckenpahler, Emily J. Patterson, 2019

机译：RAC-CNN：基于多模式深度学习的自适应光学扫描光学检眼镜图像中杆和锥感光体的自动检测和分类
7. Learning to Detect Tables in Scanned Document Images using Line Information [O] . Kasar, Thotreingam, Barlas, Philippine, Sébastien, Adam, 2013

机译：学习使用行信息检测扫描的文档图像中的表格
8. Automatic Detection of Sand Ripple Features in Sidescan Sonar Imagery. [R] . Crawford, A., Skarke, A. 2014

机译：sidescan声纳图像中砂纹特征的自动检测。

Automatic table detection and retention from scanned document images via analysis of structural information

摘要

著录项

相似文献

相关主题

期刊订阅