首页> 外国专利> INFORMATION EXTRACTION FROM OPEN-ENDED SCHEMA-LESS TABLES

INFORMATION EXTRACTION FROM OPEN-ENDED SCHEMA-LESS TABLES

机译:不限成员名额的无用表中的信息提取

摘要

Systems and methods for generating and annotating cell documents include extracting tables from a document using a table extraction engine. Headers are extracted for each of the tables using a header detection engine. Cells are extracted from each of the tables using a cell extraction engine. A cell document is generated for each of the cells which are each correlated to corresponding portions of the headers, each cell document recording the correlation between the cells and the headers. Each cell document is annotated to generate annotated cell documents with a cell recognition model trained to perform natural language processing on the cell documents by classifying each term in each of the cell documents and extracting relationships between the terms of each of the cell documents.
机译:用于生成和注释单元文档的系统和方法包括使用表格提取引擎从文档中提取表格。使用标头检测引擎为每个表提取标头。使用单元格提取引擎从每个表中提取单元格。为每个与标题的相应部分相关的单元格生成一个单元格文档,每个单元格文档记录单元格和标题之间的相关性。对每个单元格文档进行注释,以生成带有单元格识别模型的带注释的单元格文档,该单元格识别模型通过对每个单元格文档中的每个术语进行分类并提取每个单元格文档中各个术语之间的关系,对单元格文档进行自然语言处理。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号