Classification and Information Extraction for Complex and Nested Tabular Structures in Images

机译：复杂和嵌套表格结构的分类和信息提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Understanding of technical documents, like manuals, is one of the most important steps in automatic reporting and/or troubleshooting of defects. The majority of the relevant information exists in tabular structure. There are some solutions for extracting tabular structures from text. However, it is still a big issue to extract tabular information from images and, on top of that, from complex and nested tables. This paper aims to propose classification and information extraction methods for complex tabular structures in document images. These are hybrid approaches using both image layout and OCRed text. The proposed methods outperform on a real-world technical documents dataset from a German railway company (Deutsche Bahn AG) as compared to other state-of-the-art approaches. As a result, the proposed approaches won the competition held by Deutsche Bahn AG in 2016 against other participating research groups and companies.

机译：了解技术文件，如手册，是自动报告和/或缺陷故障排除中最重要的步骤之一。大多数相关信息存在于表格结构中。有一些解决方案用于从文本中提取表格结构。但是，从复杂和嵌套表中提取来自图像的表格信息和嵌套表仍然是一个大问题。本文旨在提出文档图像中复杂表格结构的分类和信息提取方法。这些是使用图像布局和OCRED文本的混合方法。拟议的方法与德国铁路公司（Deutsche Bahn AG）相比，卓越的现实技术文件数据集相比，与其他最先进的方法相比。因此，拟议的方法赢得了2016年德意志卜恩AG举办的竞争对赛，反对其他参与的研究小组和公司。

著录项

来源
《IAPR International Conference on Document Analysis and Recognition》|2017年|732p|共6页
会议地点
作者
Amir Riad; Christian Sporer; Syed Saqib Bukhari; Andreas Dengel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词
Information retrieval; Optical character recognition software; Layout; Data mining; Companies; Image resolution; Labeling;

机译：信息检索;光学字符识别软件;布局;数据挖掘;公司;图像分辨率;标签;

相似文献

外文文献
中文文献
专利

1. Holistic design for deep learning-based discovery of tabular structures in datasheet images [J] . Ertugrul Kara, Mark Traquair, Murat Simsek, Engineering Applications of Artificial Intelligence . 2020,第Apra期

机译：基于深度学习的整体设计，可发现数据表图像中的表格结构
2. A Unified Algorithm for Identification of Various Tabular Structures from Document Images [J] . Sekhar Mandal, Amit K. Das, Partha Bhowmick, International journal of digital library systems . 2011,第6期

机译：用于从文档图像中识别各种表格结构的统一算法
3. Complex networks-based texture extraction and classification method for mineral flotation froth images [J] . Xu Degang, Chen Xiao, Xie Yongfang, Minerals Engineering . 2015,第Null期

机译：基于复杂网络的矿物浮选泡沫图像纹理提取与分类方法
4. Classification and Information Extraction for Complex and Nested Tabular Structures in Images [C] . Amir Riad, Christian Sporer, Syed Saqib Bukhari, IAPR International Conference on Document Analysis and Recognition . 2017

机译：图像中复杂嵌套表格结构的分类和信息提取
5. Extraction des structures linéaires à partir des images satellitaires à très haute résolution pour l'aide à la gestion des catastrophes majeures =Linear Structures Extraction from Very High Resolution Satellite Images to Support Major Disasters Management [D] . Ouled Sghaier, Moslem. 2017

机译：从高分辨率卫星图像中提取线性结构以支持重大灾害管理
6. A Psychophysical Imaging Method Evidencing Auditory Cue Extraction during Speech Perception: A Group Analysis of Auditory Classification Images [O] . Léo Varnet, Kenneth Knoblauch, Willy Serniclaes, -1

机译：一种在言语感知过程中证明听觉提示提取的心理物理成像方法：听觉分类图像的组分析
7. METHOD OF CLASSIFICATION OF COMPLEX STRUCTURED IMAGES ON THE BASIS OF SELF-ORGANIZED NEURAL NETWORK STRUCTURES [O] . S. A. Filist, R. A. Tomakova, O. V. Shatalova, 2016

机译：基于自组织神经网络结构的复杂结构图像分类方法
8. COMPUTER CLASSIFICATION OF REMOTELY SENSED MULTISPECTRAL IMAGE DATA BY EXTRACTION AND CLASSIFICATION OF HOMOGENEOUS OBJECTS [R] . ROBERT L. KETTIG 1975

机译：通过均匀物体的提取和分类对遥感多光谱图像数据进行计算机分类

Classification and Information Extraction for Complex and Nested Tabular Structures in Images

摘要

著录项

相似文献

相关主题

期刊订阅