首页> 外国专利> Modified levenshtein distance algorithm for coding

Modified levenshtein distance algorithm for coding

机译:改进的levenshtein距离编码算法

摘要

Methods and systems of mapping of an optical character recognition (OCR) text string to a code included in a coding dictionary by supplementing the Levenshtein Distance Algorithm (LDA) with additional information in the form of adjustments based on particular character substitutions, insertions and deletions together with weighting based on multiple alternatives for the OCR text string. An OCR text string mapping method 100 includes receiving 110 an OCR text string, comparing 120 it with selected text strings from a coding dictionary, computing 130 modified Levenshtein distances associated with the comparisons by determining substitution 140, insertion 150 and deletion 160 penalties, and combining 170 the penalties, selecting 180 the best matching text string from the coding dictionary based on the modified Levenshtein distances, determining 190 whether a maximum threshold distance is met, and assigning 200 a code associated with the best matching text string to the OCR text string when met, and assigning 210 a null or no code when not met.
机译:通过向莱文施泰因距离算法(LDA)补充基于特定字符替换,插入和删除的调整形式的附加信息,将光学字符识别(OCR)文本字符串映射到编码字典中包含的代码的方法和系统基于OCR文本字符串的多种替代方案进行加权。 OCR文本串映射方法100包括:接收110OCR文本串,将其与从编码字典中选择的文本串进行比较120,通过确定替换140,插入150和删除160的罚分,计算130与比较相关的修改的Levenshtein距离,以及组合170的惩罚,基于修改的Levenshtein距离从编码字典中选择180最佳匹配的文本字符串,确定190是否满足最大阈值距离,并在200时向OCR文本字符串分配200与最佳匹配的文本字符串相关的代码。满足,并在不满足时分配210空或无代码。

著录项

  • 公开/公告号GB0701002D0

    专利类型

  • 公开/公告日2007-02-28

    原文格式PDF

  • 申请/专利权人 LOCKHEED MARTIN CORPORATION;

    申请/专利号GB20070001002

  • 发明设计人

    申请日2007-01-18

  • 分类号G06K9/20;G06F17/22;G06K17;

  • 国家 GB

  • 入库时间 2022-08-21 20:26:40

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号