首页> 外国专利> Extracting searchable information from a digitized document

Extracting searchable information from a digitized document

机译:从数字化文档中提取可搜索信息

摘要

Data extraction and automatic validation from digitized documents in non-editable formats is disclosed. Paper documents are digitized or converted into formats suitable for storage on computers or other digital devices. The digitized documents are classified into one of a plurality of document types and based on the document type, document processing rules are selected for analyzing the digitized documents to enable data extraction and automatic validation. The positions and values of the data fields in the digitized documents are obtained using machine learning techniques. The data field values are automatically validated and assigned confidence scores. Data fields with low confidence scores are flagged for manual review.
机译:公开了以不可编辑的格式从数字化文档中提取数据和自动验证的方法。纸质文档被数字化或转换为适合存储在计算机或其他数字设备上的格式。数字化文档被分类为多种文档类型之一,并且基于文档类型,选择文档处理规则以分析数字化文档以实现数据提取和自动验证。使用机器学习技术可以获取数字化文档中数据字段的位置和值。数据字段值将自动验证并分配置信度分数。低可信度得分的数据字段被标记为手动检查。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号