首页> 外文会议>Conference on applications of digital image processing >OCR enhancement through neighbor embedding and fast approximate nearest neighbors
【24h】

OCR enhancement through neighbor embedding and fast approximate nearest neighbors

机译:OCR通过​​邻居嵌入和快速近似邻居增强

获取原文

摘要

Generic optical character recognition (OCR) engines often perform very poorly in transcribing scanned low resolution(LR) text documents. To improve OCR performance, we apply the Neighbor Embedding (NE) single-imagesuper-resolution (SISR) technique to LR scanned text documents to obtain high resolution (HR) versions, which wesubsequently process with OCR. For comparison, we repeat this procedure using bicubic interpolation (BI). We demonstratethat mean-square errors (MSE) in NE HR estimates do not increase substantially when NE is trained in oneLatin font style and tested in another, provided both styles belong to the same font category (serif or sans serif). Thisis very important in practice, since for each font size, the number of training sets required for each category may bereduced from dozens to just one. We also incorporate randomized ik/i-d trees into our NE implementation to performapproximate nearest neighbor search, and obtain a 1000x speed up of our original NE implementation, with negligibleMSE degradation. This acceleration also made it practical to combine all of our size-specific NE Latin modelsinto a single Universal Latin Model (ULM). The ULM eliminates the need to determine the unknown font categoryand size of an input LR text document and match it to an appropriate model, a very challenging task, since the dpi(pixels per inch) of the input LR image is generally unknown. Our experiments show that OCR character error rates(CER) were over 90% when we applied the Tesseract OCR engine to LR text documents (scanned at 75 dpi and 100dpi) in the 6-10 pt range. By contrast, using ik/i-d trees and the ULM, CER after NE preprocessing averaged less than7% at 3x (100 dpi LR scanning) and 4x (75 dpi LR scanning) magnification, over an order of magnitude improvement.Moreover, CER after NE preprocessing was more that 6 times lower on average than after BI preprocessing.© (2012) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
机译:通用光学字符识别(OCR)引擎通常在转录扫描的低分辨率(LR)文本文档中时非常差。为了提高OCR性能,我们将邻居嵌入(NE)单幅图像分辨率(SISR)技术应用于LR扫描的文本文档以获得高分辨率(HR)版本,其中使用OCR WesumseMutply处理。为了比较,我们使用Bicubic插值(BI)重复此过程。当NE在Onelatin字体样式中培训并在另一个时,NE时,我们在NE HR估计中规范的均值方形误差(MSE)不会显着增加,并且这两种样式都属于相同的字体类别(Serif或Sans Serif)。此目的在实践中非常重要,因为对于每个字体大小,每个类别所需的培训集数可以从数十个到一个。我们还将随机化 k -d树集合到我们的NE实现中以执行刚性最近的邻居搜索,并获得原始网元实现的1000倍加速,具有疏忽物质的劣化。此加速度还可以将所有大小特定的NE拉丁模型组合在一起,单一通用拉丁模型(ULM)。 ULM消除了确定输入LR文本文档的未知字体分类的需要,并将其与适当的模型匹配,这是一个非常具有挑战性的任务,因为输入LR图像的DPI(像素每英寸)通常是未知的。我们的实验表明,当我们在6-10pt范围内将TESSEACT OCR引擎应用于LR文本文档(在75 dpi和100dpi扫描)时,OCR字符错误率(CER)超过90%。相比之下,使用 k -d树和ulm,在网元预处理后,在3x(100dpi lr扫描)和4x(75 dpi lr扫描)放大率下平均小于7%,超过一个幅度Englatement.More,CER在网状预处理之后的平均比BI预处理之后更低的6倍。©(2012)照片光学仪表工程师(SPIE)的版权协会。仅供个人使用的摘要下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号