首页> 外国专利> MULTI-MODEL, MULTI-TASK TRAINED NEURAL NETWORK FOR ANALYZING UNSTRUCTURED AND SEMI-STRUCTURED ELECTRONIC DOCUMENTS

MULTI-MODEL, MULTI-TASK TRAINED NEURAL NETWORK FOR ANALYZING UNSTRUCTURED AND SEMI-STRUCTURED ELECTRONIC DOCUMENTS

机译:多模型,多任务训练的神经网络,用于分析非结构化和半结构化电子文档

摘要

Embodiments of the invention describe a computer-implemented method of analyzing an electronic version of a document. The computer-implemented method can include an architecture of machine learning sub-models that performs the global task of translating unstructured and semi-structured inputs into numerical representations that can be recognized and manipulated by a content-analysis (CA) sub-model without relying on brute force analysis. Embodiments of the invention achieve these results by separating the global task into auxiliary tasks and assigning each sub-model to at least one of the auxiliary tasks. The auxiliary tasks can include parsing the unstructured or semi-structured inputs into format types (e.g., lists, tables, figures, text, etc. of a PDF document), extracting features of the parsed document, and performing a computer-based CA on the extracted features. The sub-models are trained in stages and in groups, wherein both the stages and the groupings are based on the complexity of the sub-model's assigned task.
机译:本发明的实施例描述了一种分析文档的电子版本的计算机实现的方法。计算机实现的方法可以包括机器学习子模型的架构,其执行将非结构化和半结构化输入转换为可以通过内容分析(CA)子模型来识别和操纵的数字表示的全局任务,而无需依赖论蛮力分析。本发明的实施例通过将全局任务分开到辅助任务并将每个子模型分配给至少一个辅助任务来实现这些结果。辅助任务可以包括将非结构化或半结构化输入解析为格式类型(例如,PDF文档的列表,表格,图形,文本等),提取解析文档的特征,并执行基于计算机的CA提取的特征。子模型在阶段和组中培训,其中阶段和分组都基于子模型分配的任务的复杂性。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号