Counting OCR errors in typeset text

机译：计算排版文本中的OCR错误

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Abstract: Frequently object recognition accuracy is a key component in the performance analysis of pattern matching systems. In the past three years, the results of numerous excellent and rigorous studies of OCR system typeset-character accuracy (henceforth OCR accuracy) have been published, encouraging performance comparisons between a variety of OCR products and technologies. These published figures are important; OCR vendor advertisements in the popular trade magazines lead readers to believe that published OCR accuracy figures effect market share in the lucrative OCR market. Curiously, a detailed review of many of these OCR error occurrence counting results reveals that they are not reproducible as published and they are not strictly comparable due to larger variances in the counts than would be expected by the sampling variance. Naturally, since OCR accuracy is based on a ratio of the number of OCR errors over the size of the text searched for errors, imprecise OCR error accounting leads to similar imprecision in OCR accuracy. Some published papers use informal, non-automatic, or intuitively correct OCR error accounting. Still other published results present OCR error accounting methods based on string matching algorithms such as dynamic programming using Levenshtein (edit) distance but omit critical implementation details (such as the existence of suspect markers in the OCR generated output or the weights used in the dynamic programming minimization procedure). The problem with not specifically revealing the accounting method is that the number of errors found by different methods are significantly different. This paper identifies the basic accounting methods used to measure OCR errors in typeset text and offers an evaluation and comparison of the various accounting methods. !36

机译：摘要：目标识别精度经常是模式匹配系统性能分析中的关键组成部分。在过去三年中，OCR系统排版字符精度（以下简称OCR精度）的众多出色而严格的研究结果已经发表，这鼓励了各种OCR产品和技术之间的性能比较。这些公布的数字很重要;受欢迎的贸易杂志中的OCR供应商广告使读者相信，已发布的OCR准确度数据会影响利润丰厚的OCR市场中的市场份额。奇怪的是，对这些OCR错误发生计数结果中的许多结果进行的详细审查显示，它们与发布时一样不可重现，并且由于计数方差比采样方差预期的要大，因此它们也不具有严格的可比性。自然地，由于OCR准确性基于OCR错误数与所搜索文本的大小之比，因此，不精确的OCR错误记帐会导致类似的OCR准确性不准确。一些已发表的论文使用非正式，非自动或直观上正确的OCR错误记帐。还有其他已发表的结果提出了基于字符串匹配算法的OCR错误记帐方法，例如使用Levenshtein（编辑）距离的动态编程，但是省略了关键的实现细节（例如OCR生成的输出中存在可疑标记或动态编程中使用的权重）最小化程序）。没有具体揭示会计方法的问题是，通过不同方法发现的错误数量明显不同。本文确定了用于测量排版文本中OCR错误的基本会计方法，并对各种会计方法进行了评估和比较。！36

著录项

来源
《Document Recognition II》|1995年|p.184-195|共12页
会议地点
作者
Jonathan S. Sandberg; Panasonic Technologies; Inc.; Princeton; NJ; USA.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. OCRSpell: an interactive spelling correction system for OCR errors in text [J] . Kazem Taghva, Eric Stofsky International Journal on Document Analysis and Recognition . 2001,第3期

机译：OCRSpell：用于文本中OCR错误的交互式拼写更正系统
2. A lossy/lossless compression method for printed typeset bi-level text images based on improved pattern matching [J] . Hadi Grailu, Mojtaba Lotfizad, Hadi Sadoghi-Yazdi International Journal on Document Analysis and Recognition . 2009,第4期

机译：基于改进的模式匹配的印刷排版双层文本图像的有损/无损压缩方法
3. A Model to Convert Wave-Form-Text to Linear-Form-Text for Better Readability by OCRS [J] . C.S. Vijayashree, Vasudev T International Journal of Multimedia & Its Applications . 2015,第3期

机译：通过OCRS将波形文本转换为线性文本以提高可读性的模型
4. Counting OCR errors in typeset text [C] . Jonathan S. Sandberg Conference on Document Recognition . 1995

机译：在排版文本中计算OCR错误
5. Utilizing big data in identification and correction of OCR errors. [D] . Agarwal, Shivam. 2013

机译：利用大数据识别和纠正OCR错误。
6. Scene Text Access: A Comparison of Mobile OCR Modalities for Blind Users [O] . Leo Neat, Ren Peng, Siyang Qin, -1

机译：场景文本访问：针对盲用户的移动OCR模式的比较
7. OCRSpell: an interactive spelling correction system for OCR errors in text [O] . Kazem Taghva, Eric Stofsky 2001

机译：OCRSpell：用于文本中OCR错误的交互式拼写校正系统

Counting OCR errors in typeset text

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅