首页> 外文会议>Asian Conference on Pattern Recognition >Automatic Annotation Method for Document Image Binarization in Real Systems

Automatic Annotation Method for Document Image Binarization in Real Systems




The accuracy of optical character recognition (OCR) has significantly improved recently through the use of deep learning. However, when OCR is used in real applications, the shortage of annotated images often makes training difficult. To solve this problem, there are automatic annotation methods. However, many of these methods are based on active learning, and operators need to confirm generated annotation candidates. I propose a practical automatic annotation method for binarization, which is one of the components of OCR. The purpose with the proposed method is to automatically confirm the quality of annotation candidates. This method consists of three simple processes to achieve this. First, cropping a text from a whole image. Second, applying binarization to the cropped image at all thresholds. Third, recognizing all binarized cropped images and matching the recognition results and correct character database. If the characters match, the cropped binary image is correctly binarized. The method selects that cropped binarized image as an annotation for binarization. Cropping coordinates and the correct character database (DB) can be obtained from a practical OCR system. Because users of such a system usually input corrections for misrecognition of OCR to the system, the system can obtain the correct characters and coordinates. The experimental results indicate that the annotations generated with the proposed method can improve the performance of deep-learning-based binarization. As a result, the normalized edit distance between the recognized text and grand truth text can be reduced by 38.56% on the Find it! receipt image dataset.



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号