ClusTi: Clustering Method for Table Structure Recognition in Scanned Images

Zucker Arthur; Belkada Younes; Hanh Vu; Van Nam Nguyen

首页> 外文期刊>Mobile networks & applications >ClusTi: Clustering Method for Table Structure Recognition in Scanned Images

【24h】

ClusTi: Clustering Method for Table Structure Recognition in Scanned Images

机译：CLUSTI：扫描图像中表结构识别的聚类方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

OCR (Optical Character Recognition) for scanned paper invoices is very challenging due to the variability of 19 invoice layouts, different information fields, large data tables, and low scanning quality. In this case, table structure recognition is a critical task in which all rows, columns, and cells must be accurately positioned and extracted. Existing methods such as DeepDeSRT only dealt with high-quality born-digital images (e.g., PDF) with low noise and apparent table structure. This paper proposes an efficient method called CluSTi (Clustering method for recognition of the Structure of Tables in invoice scanned Images). The contributions of CluSTi are three-fold. Firstly, it removes heavy noises in the table images using a clustering algorithm. Secondly, it extracts all text boxes using state-of-the-art text recognition. Thirdly, based on the horizontal and vertical clustering algorithm with optimized parameters, CluSTi groups the text boxes into their correct rows and columns, respectively. The method was evaluated on three datasets: i) 397 public scanned images; ii) 193 PDF document images from ICDAR 2013 competition dataset; and iii) 281 PDF document images from ICDAR 2019's numeric tables. The evaluation results showed that CluSTi achieved an F-1-score of 87.5%, 98.5%, and 94.5%, respectively. Our method also outperformed DeepDeSRT with an F-1-score of 91.44% on only 34 images from the ICDAR 2013 competition dataset. To the best of our knowledge, CluSTi is the first method to tackle the table structure recognition problem on scanned images.

机译：由于19个发票布局，不同的信息字段，大数据表和低扫描质量，因此扫描纸张发票的OCR（光学字符识别）非常具有挑战性。在这种情况下，表结构识别是必须准确地定位和提取所有行，列和小区的关键任务。 Deepdesrt等现有方法仅处理具有低噪声和明显表结构的高质量出生的数字图像（例如，PDF）。本文提出了一种称为CLUSTI的有效方法（用于识别发票扫描图像中表格结构的聚类方法）。梭菌的贡献是三倍。首先，它使用聚类算法去除桌面图像中的大噪声。其次，它使用最先进的文本识别提取所有文本框。第三，基于具有优化参数的水平和垂直聚类算法，CLUSTI将文本框分别将文本框分别分别分别为其正确的行和列。该方法在三个数据集中评估：i）397公共扫描图像; ii）193年，来自ICDAR 2013竞赛数据集的PDF文件图像;和III）来自ICDAR 2019年数字表的281个PDF文档图像。评价结果表明，CLUSTI分别实现了87.5％，98.5％和94.5％的F-1分数。我们的方法还优于Deepdesrt，F-1分数仅为91.44％，只有34张ICDAR 2013竞争数据集。据我们所知，CLUSTI是第一种解决扫描图像上表结构识别问题的方法。

著录项

来源
《Mobile networks & applications》 |2021年第4期|1765-1776|共12页
作者
Zucker Arthur; Belkada Younes; Hanh Vu; Van Nam Nguyen;
展开▼
作者单位

Sorbonne Univ Polytech Sorbonne F-75005 Paris France;

Sorbonne Univ Polytech Sorbonne F-75005 Paris France;

Viettel CyberSpace Ctr 41st Floor Keangnam Landmark 72 Hanoi Vietnam;

Thuyloi Univ Comp Sci & Engn Dept 175 TaySon Hanoi Vietnam;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Table structure recognition; Object recognition; Clustering method;

机译：表结构识别;对象识别;聚类方法;

相似文献

外文文献
中文文献
专利

1. Recognition of chain-coded handwritten character images with scanning n-tuple method [J] . Lucas S., Amiri A. Electronics Letters . 1995,第24期

机译：扫描n元组法识别链编码手写字符图像
2. Scanning Electron Microscopy Combined With Image Processing Technique: Microstructure and Texture Analysis of Legumes and Vegetables for Instant Meal [J] . Pieniazek Facundo, Messina Valeria Microscopy research and technique . 2016,第4期

机译：扫描电子显微镜结合图像处理技术：豆类和蔬菜即食食品的微观结构和质地分析
3. Image Scanning Method for Vascular Pattern Recognition [J] . Choi Jihoon, Noh Heeso Journal of the Korean Physical Society . 2019,第3期

机译：血管模式识别的图像扫描方法
4. The new adaptive clustering method of laser scanner data for automated vehicle obstacle recognition in unstructured environment [C] . Kang Xiao, Zhu Wei, Li Ke Jie, ICMA 2012;International Conference on Mechatronics and Automation . 2012

机译：非结构化环境中自动识别车辆障碍物的激光扫描仪数据自适应聚类新方法
5. The structure of Escherichia coli signal recognition particle revealed by scanning transmission electron microscopy and electron spectroscopic imaging [D] . Mainprize, Iain L. 2006

机译：扫描透射电镜和电子光谱成像揭示大肠杆菌信号识别颗粒的结构
6. Lookup-table method for imaging optical properties with structured illumination beyond the diffusion theory regime [O] . Tim A. Erickson, Amaan Mazhar, David Cuccia, 2010

机译：超越扩散理论范围的具有结构化照明的光学特性成像的查找表方法
7. Structured Light Methods for Underwater Imaging: Light Stripe Scanning and Photometric Stereo [O] . Srinivasa G. Narasimhan, Shree K. Nayar 2010

机译：水下成像的结构化光方法：条纹扫描和光度立体
8. Fingerprint recognition of wavelet-based compressed images by neuro-fuzzy clustering [R] . T. C. Liu, Sunanda Mitra 1996

机译：基于神经模糊聚类的小波压缩图像指纹识别

ClusTi: Clustering Method for Table Structure Recognition in Scanned Images

摘要

著录项

相似文献

相关主题

期刊订阅