首页> 外国专利> UTILIZING MACHINE LEARNING MODELS, POSITION-BASED EXTRACTION, AND AUTOMATED DATA LABELING TO PROCESS IMAGE-BASED DOCUMENTS

UTILIZING MACHINE LEARNING MODELS, POSITION-BASED EXTRACTION, AND AUTOMATED DATA LABELING TO PROCESS IMAGE-BASED DOCUMENTS

机译:利用机器学习模型,基于位置的提取和自动数据标签来处理基于图像的文档

摘要

A device may receive image data that includes an image of a document and lexicon data identifying a lexicon, and may perform an extraction technique on the image data to identify at least one field in the document. The device may utilize form segmentation to automatically generate label data identifying labels for the image data, and may process the image data, the label data, and data identifying the at least one field, with a first model, to identify visual features. The device may process the image data and the visual features, with a second model, to identify sequences of characters, and may process the image data and the sequences of characters, with a third model, to identify strings of characters. The device may compare the lexicon data and the strings of characters to generate verified strings of characters that may be utilized to generate a digitized document.
机译:设备可以接收包括识别词典的文档的图像和识别词典的映像的图像数据,并且可以在图像数据上执行提取技术以识别文档中的至少一个字段。 该设备可以利用表格分割以自动生成用于图像数据的标签标识标签,并且可以利用第一模型处理识别至少一个字段的图像数据,标签数据和数据以识别视觉特征。 该设备可以处理图像数据和视觉特征,其中具有第二模型,以识别字符的序列,并且可以利用第三模型处理图像数据和字符序列,以识别字符串。 该设备可以比较词汇数据和字符串以生成可用于生成数字化文档的验证字符串。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号