首页> 外国专利> AUTOMATED DATA EXTRACTION SYSTEM BASED ON HISTORICAL OR RELATED DATA

AUTOMATED DATA EXTRACTION SYSTEM BASED ON HISTORICAL OR RELATED DATA

机译:基于历史或相关数据的自动数据提取系统

摘要

A system and method for data extraction from structured documents using historical or related data. Structured documents are searched for instances of an attribute value that match a known historical value for the attribute. Document features associated with the attribute value are identified and anchor a location within the hierarchy of the document structure where the attribute value can be found and extracted. An accuracy for the identified anchors is determined by evaluating how well the anchor's extraction history matches the reported history. Anchors are grouped into anchor sets such that all anchors in a set extract attributes from the same structured document template. The anchors are prioritized according to the determined accuracy, the prioritized list defining the order in which a structure document template should be searched for an attribute value.
机译:一种使用历史或相关数据从结构化文档中提取数据的系统和方法。在结构化文档中搜索与该属性的已知历史值匹配的属性值实例。识别与属性值关联的文档特征,并将其锚定在文档结构层次结构中可以找到和提取属性值的位置。通过评估锚点的提取历史与报告历史的匹配程度,可以确定所识别锚点的准确性。锚被分组为锚集,以便集合中的所有锚都从同一结构化文档模板中提取属性。根据确定的准确性对锚定优先级,优先级列表定义应在结构文档模板中搜索属性值的顺序。

著录项

  • 公开/公告号US2018329873A1

    专利类型

  • 公开/公告日2018-11-15

    原文格式PDF

  • 申请/专利权人 GOOGLE INC.;

    申请/专利号US201514682071

  • 申请日2015-04-08

  • 分类号G06F17/22;G06F17/21;

  • 国家 US

  • 入库时间 2022-08-21 12:06:16

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号