...
首页> 外文期刊>Pattern Analysis and Applications >OCR error correction using correction patterns and self-organizing migrating algorithm
【24h】

OCR error correction using correction patterns and self-organizing migrating algorithm

机译:使用校正模式和自组织迁移算法OCR纠错

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Optical character recognition (OCR) systems help to digitize paper-based historical achieves. However, poor quality of scanned documents and limitations of text recognition techniques result in different kinds of errors in OCR outputs. Post-processing is an essential step in improving the output quality of OCR systems by detecting and cleaning the errors. In this paper, we present an automatic model consisting of both error detection and error correction phases for OCR post-processing. We propose a novel approach of OCR post-processing error correction using correction pattern edits and evolutionary algorithm which has been mainly used for solving optimization problems. Our model adopts a variant of the self-organizing migrating algorithm along with a fitness function based on modifications of important linguistic features. We illustrate how to construct the table of correction pattern edits involving all types of edit operations and being directly learned from the training dataset. Through efficient settings of the algorithm parameters, our model can be performed with high-quality candidate generation and error correction. The experimental results show that our proposed approach outperforms various baseline approaches as evaluated on the benchmark dataset of ICDAR 2017 Post-OCR text correction competition.
机译:光学字符识别(OCR)系统有助于将基于纸张的历史验证。但是,扫描文档质量差和文本识别技术的局限导致OCR输出中的不同类型的错误。后处理是通过检测和清洁误差来提高OCR系统的输出质量的重要步骤。在本文中,我们提出了一种自动模型,包括OCR后处理的错误检测和纠错阶段。我们使用主要用于解决优化问题的校正模式编辑和进化算法,提出了一种新的OCR后处理纠错纠正方法。我们的模型采用自组织迁移算法的变种以及基于重要语言特征的修改的健身功能。我们说明了如何构建涉及所有类型的编辑操作的修正模式编辑表,并从训练数据集直接学习。通过高效设置算法参数,我们的模型可以以高质量的候选生成和纠错执行。实验结果表明,我们所提出的方法优于OCR文本修正竞争的基准数据集评估的各种基线方法。

著录项

  • 来源
    《Pattern Analysis and Applications》 |2021年第2期|701-721|共21页
  • 作者单位

    Van Lang Univ Co Giang Ward 45 Nguyen Khac Nhu Dist 1 Ho Chi Minh City Vietnam|Tech Univ Ostrava Dept Comp Sci FEECS VSB 17 Listopadu 15 Ostrava 70833 Czech Republic;

    Ctr Open Data Humanities Tokyo 1018430 Japan|Nguyen Tat Thanh Univ NTT Hitech Inst 300A Nguyen Tat Thanh Dist 4 Ho Chi Minh City Vietnam;

    Univ Informat Technol Linh Trung Ward Quarter 6 Ho Chi Minh City Vietnam;

    Tech Univ Ostrava Dept Comp Sci FEECS VSB 17 Listopadu 15 Ostrava 70833 Czech Republic;

  • 收录信息 美国《科学引文索引》(SCI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    OCR; N-grams; Similarity; Context; Correction pattern; Evolutionary algorithm;

    机译:OCR;n-grams;相似性;背景;校正模式;进化算法;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号