首页> 外国专利> Targeted optical character recognition (OCR) for medical terminology

Targeted optical character recognition (OCR) for medical terminology

机译:用于医学术语的目标光学字符识别(OCR)

摘要

Embodiments of the present invention provide concepts for correcting optical character recognition (OCR) errors from and OCR scan result by sequentially applying an anagram hash (AH) and Levenshtein-Distance (LD) measurement for concurrent character identity-based (machine code) and character shape-based (OCR-Key) corrections. The OCR-Key classifies characters by shape into one or more disjoint and overlapping classes. Similar shaped-based classes appearing in consecutive characters are appended to a cardinality term, a repetition count of the class. The LD measurement groups OCR-Keys and differentiates on both class and cardinality to arrive at a shape-based mismatch error between competing candidate words from an associated dictionary and a target word from the OCR scan. The shape-based LD measurement errors are then functionally merged with the character identity-based deletion, substitution, and insertion errors to find a minimum error for the set of candidate words, corresponding to the preferred candidate word match to the target word.
机译:本发明的实施例提供了通过顺序地对基于并发的基于字符身份的(机器代码)和字符应用字谜散列(AH)和Levenshtein距离(LD)测量来校正来自OCR和OCR扫描结果的光学字符识别(OCR)错误的概念。基于形状的(OCR-Key)校正。 OCR-Key按形状将字符分类为一个或多个不相交和重叠的类。在连续字符中出现的类似的基于形状的类将附加到基数项(该类的重复计数)上。 LD测量将OCR-Keys分组,并在类别和基数上进行区分,以得出来自关联字典的竞争候选单词和来自OCR扫描的目标单词之间基于形状的失配误差。然后,将基于形状的LD测量错误与基于字符身份的删除,替换和插入错误进行功能合并,以找到对应于首选候选单词与目标单词匹配的一组候选单词的最小错误。

著录项

  • 公开/公告号US9633271B2

    专利类型

  • 公开/公告日2017-04-25

    原文格式PDF

  • 申请/专利权人 OPTUM INC.;

    申请/专利号US201615140849

  • 发明设计人 CASEY STELLA;

    申请日2016-04-28

  • 分类号G06K9/18;G06K9/03;G06K9;G06K9/72;G06K9/62;G06T7;G06K9/20;G06F3/0488;

  • 国家 US

  • 入库时间 2022-08-21 13:44:07

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号