首页> 外国专利> Unsupervised domain adaptation from generic forms for new OCR forms

Unsupervised domain adaptation from generic forms for new OCR forms

机译:从通用形式进行新OCR表格的无监督域适应

摘要

The disclosed technology is generally directed to optical text recognition for forms. In one example of the technology, line grouping rules are generated based on the generic forms and a ground truth for the generic forms. Line groupings are applied to the generic forms based on the line grouping rules. Feature extraction rules are generated. Features are extracted from the generic forms based on the feature extraction rules. A key-value classifier model is generated, such that the key-value classifier model is configured to determine, for each line of a form: a probability that the line is a value, and a probability that the line is a key. A key-value pairing model is generated, such that the key-value pairing model is configured to predict, for each key in a form, which value in the form corresponds to the key.
机译:所公开的技术通常涉及用于形式的光学文本识别。在技​​术的一个示例中,基于通用形式生成行分组规则和通用形式的基础事实。基于行分组规则将线路分组应用于通用形式。生成特征提取规则。根据特征提取规则从通用形式中提取功能。生成密钥值分类器模型,使得键值分类器模型被配置为对于表单的每行来确定:线路是值的概率,以及行是键的概率。生成密钥值配对模型,使得键值配对模型被配置为对于表单中的每个键,该键将被配置为预测表单中的值对应于键。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号