首页>
外国专利>
MULTI-MODEL, MULTI-TASK TRAINED NEURAL NETWORK FOR ANALYZING UNSTRUCTURED AND SEMI-STRUCTURED ELECTRONIC DOCUMENTS
MULTI-MODEL, MULTI-TASK TRAINED NEURAL NETWORK FOR ANALYZING UNSTRUCTURED AND SEMI-STRUCTURED ELECTRONIC DOCUMENTS
展开▼
机译:多模型,多任务训练的神经网络,用于分析非结构化和半结构化电子文档
展开▼
页面导航
摘要
著录项
相似文献
摘要
Embodiments of the invention describe a computer-implemented method of analyzing an electronic version of a document. The computer-implemented method can include an architecture of machine learning sub-models that performs the global task of translating unstructured and semi-structured inputs into numerical representations that can be recognized and manipulated by a content-analysis (CA) sub-model without relying on brute force analysis. Embodiments of the invention achieve these results by separating the global task into auxiliary tasks and assigning each sub-model to at least one of the auxiliary tasks. The auxiliary tasks can include parsing the unstructured or semi-structured inputs into format types (e.g., lists, tables, figures, text, etc. of a PDF document), extracting features of the parsed document, and performing a computer-based CA on the extracted features. The sub-models are trained in stages and in groups, wherein both the stages and the groupings are based on the complexity of the sub-model's assigned task.
展开▼