首页> 外国专利> MODIFIED LEVENSHTEIN DISTANCE ALGORITHM FOR CODING

MODIFIED LEVENSHTEIN DISTANCE ALGORITHM FOR CODING

机译:修改过的左脑蛋白距离算法

摘要

Methods and systems of mapping of an optical character recognition (OCR) tex t string to a code included in a coding dictionary by supplementing the Levenshtein Distance Algorithm (LDA) with additional information in the form of adjustments based on particular character substitutions, insertions and deletions together wit h weighting based on multiple alternatives for the OCR text string. In one embodiment, a n OCR text string mapping method (100) includes receiving (110) an OCR text string, comparing (120) it with selected text strings from a coding dictionary, computing (130 ) modified Levenshtein distances associated with the comparisons by determining (140) substitution penalties, determining (150) insertion penalties, determining (160) deletion penalties and combining (170) the penalties, selecting (180) the best matching text string from the coding dictionary based on the modified Levenshtein distances, determining (190) whether a maximum threshold distance is met, and assigning (200) a code associated with the best matching text string to the OCR text string when met, and assigning (210) a null or no code when not met.
机译:通过向莱文施泰因距离算法(LDA)补充基于特定字符替换,插入和删除的调整形式的附加信息,将光学字符识别(OCR)tex t字符串映射到编码词典中包含的代码的方法和系统基于OCR文本字符串的多个替代项进行加权。在一个实施例中,一种OCR文本串映射方法(100)包括:接收(110)OCR文本串,将其与从编码字典中选择的文本串进行比较(120),通过确定(130)与所述比较相关的修正的Levenshtein距离。 (140)替换罚分,确定(150)插入罚分,确定(160)删除罚分,并组合(170)罚分,基于修改后的Levenshtein距离从编码字典中选择(180)最匹配的文本字符串,确定(190) )是否满足最大阈值距离,并在满足条件时向OCR文本字符串分配(200)与最佳匹配文本字符串相关的代码,在不满足条件时分配(210)空或无代码。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号