首页> 外国专利> EXTRACTING INFORMATION FROM STRUCTURED DOCUMENTS CONTAINING TEXT IN NATURAL LANGUAGE

EXTRACTING INFORMATION FROM STRUCTURED DOCUMENTS CONTAINING TEXT IN NATURAL LANGUAGE

机译:从包含自然语言文本的结构化文档中提取信息

摘要

FIELD: data processing.;SUBSTANCE: invention relates to a method, a computer-readable data medium and a system for extracting data from a structured document. Method involves receiving by a computing device a table containing a text in a natural language, identifying the table header and multiple cells forming rows and columns, performing semantic-syntactic analysis of the natural language text to obtain multiple semantic structures, interpreting the multiple semantic structures using the first set of production rules for obtaining a data object represented by the table, where the production rules of this set include logic expressions defined at structural templates, performing analysis of the table header for determining multiple ontology-based classes associated with corresponding columns of the table, and modifying the data object represented by the table using the second set of production rules, where the production rules of this set are connected with the ontology-based classes associated with columns of the said table.;EFFECT: technical result is higher accuracy of forming an object of a structured document due to additional analysis of the table and modification of the formed data object represented by the table basing on this analysis.;18 cl, 19 dwg
机译:用于从结构化文档中提取数据的方法,计算机可读数据介质和系统技术领域本发明涉及一种用于从结构化文档中提取数据的方法,计算机可读数据介质和系统。该方法涉及由计算设备接收包含自然语言的文本的表,识别表头和形成行和列的多个单元格,对自然语言文本执行语义-句法分析以获得多个语义结构,解释多个语义结构。使用第一组生产规则来获取由表表示的数据对象,其中该组生产规则包括在结构模板上定义的逻辑表达式,对表头进行分析,以确定与对应的列对应的多个基于本体的类表,并使用第二组生产规则修改表所代表的数据对象,其中该组生产规则与与该表的列相关联的基于本体的类相关联;效果:技术成果更高由于进行了额外的分析,形成结构化文档的对象的准确性f表以及基于此分析的表所表示的已形成数据对象的修改。; 18 cl,19 dwg

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号