【24h】

A shadow removal method for tesseract text recognition

机译:用于tesseract文本识别的阴影去除方法

获取原文

摘要

For shadowed text images, the character recognition performance of Tesseract drops significantly. In this paper, we propose a new method to process the shadowed text images for the Tesseract's optical character recognition engine. First, a local adaptive threshold algorithm is used to transform the grayscale image into a binary image to capture the contours of texts. Next, to delete the salt-and-pepper noise in the shadow areas we propose a double-filtering algorithm, in which a projection method is used to remove the noise between texts and the median filter is used to remove the noise within characters. Finally, the processed binary image is fed into the Tesseract's optical character recognition engine. Experimental results show that the proposed method can achieve a better character recognition performance.
机译:对于阴影文本图像,Tesseract的字符识别性能会大大下降。在本文中,我们为Tesseract的光学字符识别引擎提出了一种处理阴影文本图像的新方法。首先,使用局部自适应阈值算法将灰度图像转换为二进制图像以捕获文本轮廓。接下来,为了删除阴影区域中的椒盐噪声,我们提出了一种双重过滤算法,其中使用投影方法去除文本之间的噪声,并使用中值滤波器去除字符内的噪声。最后,将经过处理的二进制图像输入到Tesseract的光学字符识别引擎中。实验结果表明,该方法可以达到较好的字符识别性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号