首页> 外文会议>Document Recognition II >Counting OCR errors in typeset text
【24h】

Counting OCR errors in typeset text

机译:计算排版文本中的OCR错误

获取原文
获取外文期刊封面目录资料

摘要

Abstract: Frequently object recognition accuracy is a key component in the performance analysis of pattern matching systems. In the past three years, the results of numerous excellent and rigorous studies of OCR system typeset-character accuracy (henceforth OCR accuracy) have been published, encouraging performance comparisons between a variety of OCR products and technologies. These published figures are important; OCR vendor advertisements in the popular trade magazines lead readers to believe that published OCR accuracy figures effect market share in the lucrative OCR market. Curiously, a detailed review of many of these OCR error occurrence counting results reveals that they are not reproducible as published and they are not strictly comparable due to larger variances in the counts than would be expected by the sampling variance. Naturally, since OCR accuracy is based on a ratio of the number of OCR errors over the size of the text searched for errors, imprecise OCR error accounting leads to similar imprecision in OCR accuracy. Some published papers use informal, non-automatic, or intuitively correct OCR error accounting. Still other published results present OCR error accounting methods based on string matching algorithms such as dynamic programming using Levenshtein (edit) distance but omit critical implementation details (such as the existence of suspect markers in the OCR generated output or the weights used in the dynamic programming minimization procedure). The problem with not specifically revealing the accounting method is that the number of errors found by different methods are significantly different. This paper identifies the basic accounting methods used to measure OCR errors in typeset text and offers an evaluation and comparison of the various accounting methods. !36
机译:摘要:目标识别精度经常是模式匹配系统性能分析中的关键组成部分。在过去三年中,OCR系统排版字符精度(以下简称OCR精度)的众多出色而严格的研究结果已经发表,这鼓励了各种OCR产品和技术之间的性能比较。这些公布的数字很重要;受欢迎的贸易杂志中的OCR供应商广告使读者相信,已发布的OCR准确度数据会影响利润丰厚的OCR市场中的市场份额。奇怪的是,对这些OCR错误发生计数结果中的许多结果进行的详细审查显示,它们与发布时一样不可重现,并且由于计数方差比采样方差预期的要大,因此它们也不具有严格的可比性。自然地,由于OCR准确性基于OCR错误数与所搜索文本的大小之比,因此,不精确的OCR错误记帐会导致类似的OCR准确性不准确。一些已发表的论文使用非正式,非自动或直观上正确的OCR错误记帐。还有其他已发表的结果提出了基于字符串匹配算法的OCR错误记帐方法,例如使用Levenshtein(编辑)距离的动态编程,但是省略了关键的实现细节(例如OCR生成的输出中存在可疑标记或动态编程中使用的权重)最小化程序)。没有具体揭示会计方法的问题是,通过不同方法发现的错误数量明显不同。本文确定了用于测量排版文本中OCR错误的基本会计方法,并对各种会计方法进行了评估和比较。 !36

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号