【24h】

Tools for Semi-automatic Preparation of Training Data for OCR

机译:半自动准备OCR训练数据的工具

获取原文

摘要

This work aims at data preparation for OCR systems based on recurrent neural networks. Precisely annotated data are necessary for training a network as well as for evaluation of OCR methods. It is possible to synthesize the data, however such data are not that realistic as the real ones. Manual annotation is thus still needed in many cases, especially in the case of historical documents we are focusing on. Although there are several complex systems for historical document processing, to the best of our knowledge, a simple annotation tool for OCR data is completely missing. Therefore, we propose and implement a set of tools utilizing artificial intelligence that simplify the annotation process. These tools create ground truths for line images that are used for training of nowadays OCR systems. Another contribution of this paper is making these tools freely available for research purposes.
机译:这项工作旨在为基于递归神经网络的OCR系统准备数据。精确注释的数据对于训练网络以及评估OCR方法是必需的。可以合成数据,但是这些数据并不像真实数据那样真实。因此,在许多情况下,尤其是在我们关注的历史文献中,仍然需要手动注释。尽管有多个用于历史文档处理的复杂系统,但据我们所知,完全没有用于OCR数据的简单注释工具。因此,我们提出并实施了一套利用人工智能简化注释过程的工具。这些工具为用于当今OCR系统训练的线图像创建了基本事实。本文的另一个贡献是免费提供这些工具用于研究目的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号