首页> 外国专利> Extracting data from semi-structured text documents

Extracting data from semi-structured text documents

机译:从半结构化文本文档中提取数据

摘要

The invention is a process, system, and workflow for extracting and warehousing data from semi-structured documents in any language. This includes, but is not limited to, one or more of methods for: the automatic building of text mining term models; the optimization or evolution of such text mining term models; the implementation of document specific (or company specific) memory; and the tying or linking of the extracted data, or metadata, once placed in a target electronic document, to the machine readable, underlying source document, thus providing verification and provenance. The process preferably incorporates a wizard-based method for producing pattern recognition text mining term models to extract data from text. The invention also includes a system, method and workflow for handling a subsequent document of similar design and structure, specifically the automatic extraction of target elements and addition of the same to a database.
机译:本发明是一种用于以任何语言从半结构化文档中提取和存储数据的过程,系统和工作流程。这包括但不限于以下一种或多种方法:自动构建文本挖掘术语模型;此类文本挖掘术语模型的优化或演变;文档特定(或公司特定)存储的实现;以及一旦将提取的数据或元数据放置在目标电子文档中,便将其链接或链接到机器可读的基础源文档,从而提供验证和出处。该过程优选地包括基于向导的方法,用于产生模式识别文本挖掘术语模型以从文本提取数据。本发明还包括一种系统,方法和工作流程,用于处理具有类似设计和结构的后续文档,特别是自动提取目标元素并将其添加到数据库中。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号