TAO: System for Table Detection and Extraction from PDF Documents

机译：TAO：PDF文件的表检测和提取系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Digital documents present knowledge in most areas of study, exchanging and communicating information in a portable way. To better use the knowledge embedded in an ever-growing information source, effective tools for automatic information extraction are needed. Tables are crucial information elements in documents of scientific nature. Most publications use tables to represent and report concrete findings of research. Current methods used to extract table data from PDF documents lack precision in detecting, extracting, and representing data from diverse layouts. We present the system TAble Organization (TAO) to automatically detect, extract and organize information from tables in PDF documents. TAO uses a processing, based on the k-nearest neighbor method and layout heuristics, to detect tables within a document and to extract table information. This system generates an enriched representation of the data extracted from tables in the PDF documents. TAO's performance is comparable to other table extraction methods, but it overcomes some related work limitations and proves to be more robust in experiments with diverse document layouts.

机译：数字文档在大多数研究领域，以便携式方式交换和沟通信息的知识。为了更好地利用嵌入在不断增长的信息源中的知识，需要用于自动信息提取的有效工具。表是科学性质文件中的重要信息要素。大多数出版物使用表来代表和报告研究的具体结果。目前用于从PDF文档中提取表数据的方法缺少检测，提取和代表来自不同布局的数据的精度。我们介绍了系统表组织（TAO），以自动检测，提取和组织PDF文档中表中的信息。 TAO使用基于K-最近邻的方法和布局启发式的处理来检测文档中的表并提取表信息。该系统生成从PDF文档中从表中提取的数据的丰富表示。 TAO的性能与其他表的提取方法相当，但它克服了一些相关的工作限制，并证明在具有不同文件布局的实验中更加强大。

著录项

来源
《International Florida Aritificial Intelligence Research Society Conference》|2016年|718p|共6页
会议地点
作者
Martha O. Perez-Arriaga; Trilce Estrada; Soraya Abad-Mota;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
入库时间 2022-08-21 04:32:00

相似文献

外文文献
中文文献
专利

1. On methods and tools of table detection, extraction and annotation in PDF documents [J] . Shah Khusro, Asima Latif, Irfan Ullah Journal of Information Science . 2015,第1期

机译：PDF文档中表格检测，提取和注释的方法和工具
2. TEXUS: A unified framework for extracting and understanding tables in PDF documents [J] . Rastan Roya, Paik Hye-Young, Shepherd John Information Processing & Management . 2019,第3期

机译：TEXUS：提取和理解PDF文档中表格的统一框架
3. Rule Based Chunk Extraction from PDF Documents Using Regular Expressions and Natural Language Processing [J] . Amol Rajaram Karad, Rahul Raghvendra Joshi International journal of computational intelligence research . 2021,第1期

机译：使用正则表达式和自然语言处理从PDF文档的规则的块提取
4. TAO: System for Table Detection and Extraction from PDF Documents [C] . Martha O. Perez-Arriaga, Trilce Estrada, Soraya Abad-Mota International Florida Aritificial Intelligence Research Society Conference . 2016

机译：TAO：PDF文件的表检测和提取系统
5. Object Detection Using Feature Extraction and Deep Learning for Advanced Driver Assistance Systems [D] . Reza, Tasmia. 2018

机译：使用特征提取和深度学习的高级驾驶员辅助系统进行对象检测
6. Large-Scale Data Mining of Rapid Residue Detection Assay Data From HTML and PDF Documents: Improving Data Access and Visualization for Veterinarians [O] . Majid Jaberi-Douraki, Soudabeh Taghian Dinani, Nuwan Indika Millagaha Gedara, 2021

机译：来自HTML和PDF文件的快速残留检测测定数据的大规模数据挖掘：改善兽医的数据访问和可视化
7. PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents [O] . Ermelinda Oro, Massimo Ruffolo 2009

机译：PDF-TREX：一种从PDF文档中识别和提取表格的方法

TAO: System for Table Detection and Extraction from PDF Documents

摘要

著录项

相似文献

相关主题

期刊订阅