首页> 外国专利> AUTOMATIC TRANSFORMATION OF COMPLEX TABLES IN DOCUMENTS INTO COMPUTER UNDERSTANDABLE STRUCTURED FORMAT WITH MAPPED DEPENDENCIES AND PROVIDING SCHEMA-LESS QUERY SUPPORT FOR SEARCHING TABLE DATA

AUTOMATIC TRANSFORMATION OF COMPLEX TABLES IN DOCUMENTS INTO COMPUTER UNDERSTANDABLE STRUCTURED FORMAT WITH MAPPED DEPENDENCIES AND PROVIDING SCHEMA-LESS QUERY SUPPORT FOR SEARCHING TABLE DATA

机译:将复杂表自动转换为具有可映射依赖关系的可理解结构化格式的计算机,并提供无需模式的查询支持以搜索表数据

摘要

An information processing system, a computer readable storage medium, and a computer-implemented method, collect tables from a corpus of documents, convert the collected tables to flattened table format and organized to be searchable by schema-less queries. A method collects tables, extracts feature values from collected table data and collected table meta-data for each collected table. A table classifier classifies each collected table as being a type of table. Based on the classifying, the collected table is converted to a flattened table including table values that are the table data and the table meta-data of the collected table. Dependencies of the data values are mapped. The flattened table and mapped dependencies are stored in a triple store searchable by schema-less queries. The table classifier learns and improves its accuracy and reliability. Dependency information is maintained among a plurality of database tables. The dependency information can be updated at variable update frequency.
机译:信息处理系统,计算机可读存储介质和计算机实现的方法,从文档集中收集表,将收集到的表转换为扁平化的表格式,并被组织为可通过无模式查询来搜索。一种方法是收集表,从收集的表数据和每个收集的表的收集的表元数据中提取特征值。表分类器将每个收集的表分类为一种表类型。基于分类,将收集到的表转换为包含表值的扁平表,该表值是收集到的表的表数据和表元数据。映射数据值的依存关系。展平的表和映射的依赖项存储在可通过无模式查询搜索的三元组存储中。表分类器学习并提高其准确性和可靠性。在多个数据库表之间维护依赖性信息。依赖性信息可以以可变的更新频率来更新。

著录项

相似文献

  • 专利
  • 外文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号