首页> 美国政府科技报告 >Automatic Closed-Loop Methodology for Generating Character Groundtruth for Scanned Documents
【24h】

Automatic Closed-Loop Methodology for Generating Character Groundtruth for Scanned Documents

机译:用于生成扫描文档字符地面真实的自动闭环方法

获取原文

摘要

Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collections of accurate groundtruth for characters in a real (scanned) document image is not practical because (1) accuracy in delineating groundtruth character bounding boxes is not high enough, (2) it is extremely laborious and time consuming, and (3) the manual labor required for this task is prohibitively expensive. In this paper we describe a closed-loop methodology for collecting very accurate groundtruth for scanned documents. We first create ideal documents using a typesetting language. Next we create the groundtruth for the ideal document. The ideal document is then printed, photocopied and scanned. A registration algorithm estimates the global geometric transformation and then performs a robust local bitmap match to register the ideal document image to the scanned document image. Finally, groundtruth associated with the ideal document image is transformed using the estimated geometric transformation to create the groundtruth for the scanned document image. This methodology is very general and can be used for creating groundtruth for typeset documents in any language, layout, font, and style.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号