首页>
外国专利>
SYSTEM AND METHOD FOR FORMAT-AGNOSTIC DOCUMENT INGESTION
SYSTEM AND METHOD FOR FORMAT-AGNOSTIC DOCUMENT INGESTION
展开▼
机译:格式识别文档输入的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A system for format-agnostic document ingestion including a document ingestion server and a database is disclosed. The server is configured to receive an image of a document comprising text in an unknown format, convert the image, using OCR, into a plurality of text elements a content, a size, and an absolute position. The server is also configured to retrieve data detectors from the database, each associated with a data type anticipated to be in the document, and comprising at least one identifier and direction, and at least one validation criteria. The server is also configured to identify a potential descriptor by comparing the content of each text element with the at least one identifier, and then determine if the text element pointed to by the data detector meets the validation criteria. Finally, the server is configured to associate the validated text element with the data detector, and store the content.
展开▼