首页> 外文会议>Document Recognition II >Modified character-level deciphering algorithm for OCR in degraded documents
【24h】

Modified character-level deciphering algorithm for OCR in degraded documents

机译:降级文档中OCR的改进字符级解密算法

获取原文

摘要

Abstract: Modifications to a previous character-level deciphering algorithm for OCR are presented in this paper that are able to handle touching characters and are tolerant to mistakes made at the clustering stage. The objective of a character-level deciphering algorithm is to assign alphabetic identities to character patterns such that the character repetition pattern in an input text matches the letter repetition pattern provided by a language model. Degradation in document images usually causes the occurrence of touching characters and mistakes in clustering the character patterns, which pose difficulties for character-level deciphering algorithms. The modifications proposed in this paper tightly integrate visual constraints from characters and touching patterns with constraints from a language model to decode touching characters and to detect and reverse clustering mistakes. It provides a deciphering algorithm with robust performance under image degradation. !5
机译:摘要:本文提出了对OCR的先前字符级解密算法的修改,该算法能够处理触摸字符并容忍在聚类阶段所犯的错误。字符级解密算法的目的是将字母标识分配给字符模式,以使输入文本中的字符重复模式与语言模型提供的字母重复模式匹配。文档图像的降级通常会导致触摸字符的出现和字符图案聚类中的错误,这给字符级解密算法带来了困难。本文提出的修改将字符和触摸模式的视觉约束与语言模型的约束紧密集成在一起,以解码触摸字符并检测和逆转聚类错误。它提供了一种在图像降级下性能稳定的解密算法。 !5

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号