首页> 外文会议>International Florida Aritificial Intelligence Research Society Conference >TAO: System for Table Detection and Extraction from PDF Documents
【24h】

TAO: System for Table Detection and Extraction from PDF Documents

机译:TAO:PDF文件的表检测和提取系统

获取原文

摘要

Digital documents present knowledge in most areas of study, exchanging and communicating information in a portable way. To better use the knowledge embedded in an ever-growing information source, effective tools for automatic information extraction are needed. Tables are crucial information elements in documents of scientific nature. Most publications use tables to represent and report concrete findings of research. Current methods used to extract table data from PDF documents lack precision in detecting, extracting, and representing data from diverse layouts. We present the system TAble Organization (TAO) to automatically detect, extract and organize information from tables in PDF documents. TAO uses a processing, based on the k-nearest neighbor method and layout heuristics, to detect tables within a document and to extract table information. This system generates an enriched representation of the data extracted from tables in the PDF documents. TAO's performance is comparable to other table extraction methods, but it overcomes some related work limitations and proves to be more robust in experiments with diverse document layouts.
机译:数字文档在大多数研究领域,以便携式方式交换和沟通信息的知识。为了更好地利用嵌入在不断增长的信息源中的知识,需要用于自动信息提取的有效工具。表是科学性质文件中的重要信息要素。大多数出版物使用表来代表和报告研究的具体结果。目前用于从PDF文档中提取表数据的方法缺少检测,提取和代表来自不同布局的数据的精度。我们介绍了系统表组织(TAO),以自动检测,提取和组织PDF文档中表中的信息。 TAO使用基于K-最近邻的方法和布局启发式的处理来检测文档中的表并提取表信息。该系统生成从PDF文档中从表中提取的数据的丰富表示。 TAO的性能与其他表的提取方法相当,但它克服了一些相关的工作限制,并证明在具有不同文件布局的实验中更加强大。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号