首页>
外国专利>
MODIFIED LEVENSHTEIN DISTANCE ALGORITHM FOR CODING
MODIFIED LEVENSHTEIN DISTANCE ALGORITHM FOR CODING
展开▼
机译:修改过的左脑蛋白距离算法
展开▼
页面导航
摘要
著录项
相似文献
摘要
Methods and systems of mapping of an optical character recognition (OCR) tex t string to a code included in a coding dictionary by supplementing the Levenshtein Distance Algorithm (LDA) with additional information in the form of adjustments based on particular character substitutions, insertions and deletions together wit h weighting based on multiple alternatives for the OCR text string. In one embodiment, a n OCR text string mapping method (100) includes receiving (110) an OCR text string, comparing (120) it with selected text strings from a coding dictionary, computing (130 ) modified Levenshtein distances associated with the comparisons by determining (140) substitution penalties, determining (150) insertion penalties, determining (160) deletion penalties and combining (170) the penalties, selecting (180) the best matching text string from the coding dictionary based on the modified Levenshtein distances, determining (190) whether a maximum threshold distance is met, and assigning (200) a code associated with the best matching text string to the OCR text string when met, and assigning (210) a null or no code when not met.
展开▼