首页>
外国专利>
Method for extracting, interpreting and standardizing tabular data from unstructured documents
Method for extracting, interpreting and standardizing tabular data from unstructured documents
展开▼
机译:从非结构化文档中提取,解释和标准化表格数据的方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A system, method, and computer program for automatically identifying, parsing, and interpreting tabular data from unstructured documents stored in various formats such as ASCII text, Unicode text, HTML, PDF text, and PDF image format is provided. A set of table identification, parsing/tokenizing, and interpreting/mapping rules are developed with grammar descriptors. These rules are then applied to a set of documents to identify a table, parse the content of the table, and interpret the parsed content, if required, thereby standardizing the tabular data.
展开▼