首页> 外国专利> Machine learning of document templates for data extraction

Machine learning of document templates for data extraction

机译：文档模板的机器学习以进行数据提取

页面导航

摘要
著录项
相似文献

摘要

The present system can perform machine learning of prototypical descriptions of data elements for extraction from machine-readable documents. Document templates are created from sets of training documents that can be used to extract data from form documents, such as: fill-in forms used for taxes; flex-form documents having many variants, such as bills of lading or insurance notifications; and some context-form documents having a description or graphic indicator in proximity to a data element. In response to training documents, the system performs an inductive reasoning process to generalize a document template so that the location of data elements can be predicted for the training examples. The automatically generated document template can then be used to extract data elements from a wide variety of form documents.

机译：本系统可以执行对数据元素的原型描述的机器学习，以从机器可读文档中提取。文档模板是从一组培训文档中创建的，可用于从表单文档中提取数据，例如：填写税单;具有多种变体的弹性格式文档，例如提货单或保险通知单;以及一些上下文形式的文档，该文档在数据元素附近具有描述或图形指示符。响应于培训文档，系统执行归纳推理过程以概括文档模板，以便可以针对培训示例预测数据元素的位置。然后，可以使用自动生成的文档模板从各种形式的文档中提取数据元素。

著录项

公开/公告号US7149347B1

专利类型
公开/公告日2006-12-12

原文格式PDF
申请/专利权人 JANUSZ WNEK;
展开▼

申请/专利号US20000518176
发明设计人 JANUSZ WNEK;
展开▼

申请日2000-03-02
分类号G06K9/00;
国家 US
入库时间 2022-08-21 21:01:21

相似文献

专利
外文文献
中文文献