首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Classification and Information Extraction for Complex and Nested Tabular Structures in Images
【24h】

Classification and Information Extraction for Complex and Nested Tabular Structures in Images

机译:复杂和嵌套表格结构的分类和信息提取

获取原文

摘要

Understanding of technical documents, like manuals, is one of the most important steps in automatic reporting and/or troubleshooting of defects. The majority of the relevant information exists in tabular structure. There are some solutions for extracting tabular structures from text. However, it is still a big issue to extract tabular information from images and, on top of that, from complex and nested tables. This paper aims to propose classification and information extraction methods for complex tabular structures in document images. These are hybrid approaches using both image layout and OCRed text. The proposed methods outperform on a real-world technical documents dataset from a German railway company (Deutsche Bahn AG) as compared to other state-of-the-art approaches. As a result, the proposed approaches won the competition held by Deutsche Bahn AG in 2016 against other participating research groups and companies.
机译:了解技术文件,如手册,是自动报告和/或缺陷故障排除中最重要的步骤之一。大多数相关信息存在于表格结构中。有一些解决方案用于从文本中提取表格结构。但是,从复杂和嵌套表中提取来自图像的表格信息和嵌套表仍然是一个大问题。本文旨在提出文档图像中复杂表格结构的分类和信息提取方法。这些是使用图像布局和OCRED文本的混合方法。拟议的方法与德国铁路公司(Deutsche Bahn AG)相比,卓越的现实技术文件数据集相比,与其他最先进的方法相比。因此,拟议的方法赢得了2016年德意志卜恩AG举办的竞争对赛,反对其他参与的研究小组和公司。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号