首页> 外国专利> Method for extracting, interpreting and standardizing tabular data from unstructured documents

Method for extracting, interpreting and standardizing tabular data from unstructured documents

机译:从非结构化文档中提取,解释和标准化表格数据的方法

摘要

A system, method, and computer program for automatically identifying, parsing, and interpreting tabular data from unstructured documents stored in various formats such as ASCII text, Unicode text, HTML, PDF text, and PDF image format is provided. A set of table identification, parsing/tokenizing, and interpreting/mapping rules are developed with grammar descriptors. These rules are then applied to a set of documents to identify a table, parse the content of the table, and interpret the parsed content, if required, thereby standardizing the tabular data.
机译:提供了一种用于自动识别,解析和解释来自以各种格式存储的非结构化文档的表格数据的系统,方法和计算机程序,该非结构化文档以各种格式存储,例如ASCII文本,Unicode文本,HTML,PDF文本和PDF图像格式。使用语法描述符开发了一组表标识,解析/标记和解释/映射规则。然后,将这些规则应用于一组文档以标识表,解析表的内容并解释解析的内容(如果需要),从而标准化表格数据。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号