...
【24h】

Automatic Word Ground Truth Generation for Camera Captured Documents

机译:相机捕获文档的自动单词地面真相生成

获取原文
获取原文并翻译 | 示例

摘要

A database for camera captured documents is useful to train OCRs to obtain better performance. However, no dataset exists for camera captured documents because it is very laborious and costly to build these datasets manually. In this paper, a fully automatic approach allowing building the very large scale (i.e., millions of images) labeled camera captured documents dataset is proposed. The proposed approach does not require any human intervention in labeling. Evaluation of samples generated by the -proposed approach shows that more than 97% of the images are correctly labeled. Novelty of the proposed approach lies in the use of document image retrieval for automatic labeling, especially for camera captured documents, which contain different distortions specific to camera, e.g., blur, perspective distortion, etc.
机译:相机捕获的文档数据库对于训练OCR以获得更好的性能很有用。但是,不存在相机捕获的文档的数据集,因为手动构建这些数据集非常费力且昂贵。在本文中,提出了一种全自动方法,该方法可以构建非常大规模(即数百万张图像)的带标签的摄像机捕获文档数据集。提议的方法不需要任何人为干预。通过提议的方法生成的样本的评估显示,正确地标记了超过97%的图像。所提出方法的新颖之处在于将文档图像检索用于自动标记,尤其是对于相机捕获的文档,该文档图像包含针对相机的不同变形,例如模糊,透视变形等。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号