Utilizing Web Data in Identification and Correction of OCR Errors

机译：利用Web数据在识别和校正OCR错误中

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we report on our experiments for detection and correction of OCR errors with web data. More specifically, we utilize Google search to access the big data resources available to identify possible candidates for correction. We then use a combination of the Longest Common Subsequences (LCS) and Bayesian estimates to automatically pick the proper candidate. Our experimental results on a small set of historical newspaper data show a recall and precision of 51% and 100%, respectively. The work in this paper further provides a detailed classification and analysis of all errors. In particular, we point out the shortcomings of our approach in its ability to suggest proper candidates to correct the remaining errors.

机译：在本文中，我们报告了我们对Web数据的检测和校正OCR错误的实验。更具体地，我们利用Google搜索来访问可用于识别校正的可能候选的大数据资源。然后，我们使用最长的常用子序列（LCS）和贝叶斯估计的组合来自动选择适当的候选人。我们对一小组历史报纸数据的实验结果分别显示了51％和100％的召回和精度。本文的工作还提供了对所有错误的详细分类和分析。特别是，我们指出了我们的方法的缺点能够建议适当的候选人来纠正剩余错误。

著录项

来源
《SPIE Conference on Document Recognition and Retrieval》|2014年||共6页
会议地点
作者
Kazem Taghva; Shivam Agarwal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 N-532;
关键词
Post Processing; Information Extraction; Mining; Error Identification; Error Correction; Big Data;

机译：后处理;信息提取;挖掘;错误识别;纠错;大数据;

相似文献

外文文献
中文文献
专利

1. OCR error correction using correction patterns and self-organizing migrating algorithm [J] . Nguyen Quoc-Dung, Le Duc-Anh, Phan Nguyet-Minh, Pattern Analysis and Applications . 2021,第2期

机译：使用校正模式和自组织迁移算法OCR纠错
2. Ontologies and Bigram-based approach for Isolated Non-word Errors Correction in OCR System [J] . Aicha Eutamene, Mohamed Khireddine Kholladi, Hacene Belhadef International Journal of Electrical and Computer Engineering . 2015,第6期

机译：OCR系统中的孤立非词错误校正的本体和基于Bigram的方法
3. OCRSpell: an interactive spelling correction system for OCR errors in text [J] . Kazem Taghva, Eric Stofsky International Journal on Document Analysis and Recognition . 2001,第3期

机译：OCRSpell：用于文本中OCR错误的交互式拼写更正系统
4. Utilizing Web Data in Identification and Correction of OCR Errors [C] . Kazem Taghva, Shivam Agarwal Document recognition and retrieval XXI . 2014

机译：利用Web数据识别和纠正OCR错误
5. Utilizing big data in identification and correction of OCR errors. [D] . Agarwal, Shivam. 2013

机译：利用大数据识别和纠正OCR错误。
6. Identification and correction of systematic error in high-throughput sequence data [O] . Frazer Meacham, Dario Boffelli, Joseph Dhahbi, 2011

机译：高通量序列数据中系统错误的识别和纠正
7. Utilizing Big Data in Identification and Correction of OCR Errors [O] . Agarwal Shivam 2013

机译：利用大数据识别和纠正OCR错误

Utilizing Web Data in Identification and Correction of OCR Errors

摘要

著录项

相似文献

相关主题

期刊订阅