首页> 外国专利> TEXT BASED SCHEMA DISCOVERY AND INFORMATION EXTRACTION

TEXT BASED SCHEMA DISCOVERY AND INFORMATION EXTRACTION

机译:基于文本的模式发现和信息提取

摘要

Various technologies and techniques are disclosed for text based schema discovery and information extraction. Documents are analyzed to identify sections of the documents and a relationship between the sections. Statistics are stored regarding occurrences of items in the documents. A probabilistic model is generated based on the stored statistics. A database schema is generated with a plurality of tables based upon the probabilistic model. The documents are analyzed against the probabilistic model to determine how the documents map to the tables generated from the database schema. The tables are populated from the documents based on a result of the analysis against the probabilistic model.
机译:公开了各种技术和技术用于基于文本的模式发现和信息提取。分析文档以识别文档的各个部分以及这些部分之间的关​​系。存储有关文档中项目出现的统计信息。基于存储的统计信息生成概率模型。基于概率模型,使用多个表生成数据库模式。根据概率模型分析文档,以确定文档如何映射到从数据库模式生成的表。这些表是根据对概率模型的分析结果从文档中填充的。

著录项

  • 公开/公告号US2009300043A1

    专利类型

  • 公开/公告日2009-12-03

    原文格式PDF

  • 申请/专利权人 C. JAMES MACLENNAN;

    申请/专利号US20080127017

  • 发明设计人 C. JAMES MACLENNAN;

    申请日2008-05-27

  • 分类号G06F17/30;

  • 国家 US

  • 入库时间 2022-08-21 18:50:43

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号