首页> 外文会议>Applications of digital image processing XXXV. >OCR enhancement through neighbor embedding and fast approximate nearest neighbors
【24h】

OCR enhancement through neighbor embedding and fast approximate nearest neighbors

机译:通过邻居嵌入和快速近似最近邻居增强OCR

获取原文
获取原文并翻译 | 示例

摘要

Generic optical character recognition (OCR) engines often perform very poorly in transcribing scanned low resolution(LR) text documents. To improve OCR performance, we apply the Neighbor Embedding (NE) single-imagesuper-resolution (SISR) technique to LR scanned text documents to obtain high resolution (HR) versions, which wesubsequently process with OCR. For comparison, we repeat this procedure using bicubic interpolation (BI). We demonstratethat mean-square errors (MSE) in NE HR estimates do not increase substantially when NE is trained in oneLatin font style and tested in another, provided both styles belong to the same font category (serif or sans serif). Thisis very important in practice, since for each font size, the number of training sets required for each category may bereduced from dozens to just one. We also incorporate randomized ik/i-d trees into our NE implementation to performapproximate nearest neighbor search, and obtain a 1000x speed up of our original NE implementation, with negligibleMSE degradation. This acceleration also made it practical to combine all of our size-specific NE Latin modelsinto a single Universal Latin Model (ULM). The ULM eliminates the need to determine the unknown font categoryand size of an input LR text document and match it to an appropriate model, a very challenging task, since the dpi(pixels per inch) of the input LR image is generally unknown. Our experiments show that OCR character error rates(CER) were over 90% when we applied the Tesseract OCR engine to LR text documents (scanned at 75 dpi and 100dpi) in the 6-10 pt range. By contrast, using ik/i-d trees and the ULM, CER after NE preprocessing averaged less than7% at 3x (100 dpi LR scanning) and 4x (75 dpi LR scanning) magnification, over an order of magnitude improvement.Moreover, CER after NE preprocessing was more that 6 times lower on average than after BI preprocessing.© (2012) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
机译:通用光学字符识别(OCR)引擎在转录扫描的低分辨率(LR)文本文档时通常表现非常差。为了提高OCR性能,我们将邻居嵌入(NE)单图像超分辨率(SISR)技术应用于LR扫描的文本文档,以获得高分辨率(HR)版本,随后我们使用OCR处理该版本。为了进行比较,我们使用双三次插值(BI)重复此过程。我们证明,如果NE以一种拉丁字体训练并以另一种字体进行测试,则NE HR估计中的均方误差(MSE)不会显着增加,只要这两种样式属于同一字体类别(serif或sans serif)。这在实践中非常重要,因为对于每种字体大小,每个类别所需的训练集的数量可以从几十个减少到一个。我们还将随机化的 k -d树合并到我们的NE实现中,以执行近似的最近邻居搜索,并以不超过MSE降级的速度将原始NE实现的速度提高了1000倍。这种加速还使将所有特定于尺寸的NE拉丁模型合并为一个通用拉丁模型(ULM)变得切实可行。由于通常不知道输入LR图像的dpi(每英寸像素),因此ULM无需确定输入LR文本文档的未知字体类别和大小并将其与适当的模型匹配,这是一项非常艰巨的任务。我们的实验表明,当我们将Tesseract OCR引擎应用于6-10 pt范围内的LR文本文档(以75 dpi和100dpi扫描)时,OCR字符错误率(CER)超过90%。相比之下,使用 k -d树和ULM,NE预处理后的CER在3倍(100 dpi LR扫描)和4倍(75 dpi LR扫描)放大倍数下平均不到7%。此外,NE预处理后的CER平均比BI预处理后低6倍。©(2012)COPYRIGHT光电仪器工程师协会(SPIE)。摘要的下载仅允许个人使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号