首页> 外文会议>SPIE Conference on Document Recognition and Retrieval >Utilizing Web Data in Identification and Correction of OCR Errors
【24h】

Utilizing Web Data in Identification and Correction of OCR Errors

机译:利用Web数据在识别和校正OCR错误中

获取原文

摘要

In this paper, we report on our experiments for detection and correction of OCR errors with web data. More specifically, we utilize Google search to access the big data resources available to identify possible candidates for correction. We then use a combination of the Longest Common Subsequences (LCS) and Bayesian estimates to automatically pick the proper candidate. Our experimental results on a small set of historical newspaper data show a recall and precision of 51% and 100%, respectively. The work in this paper further provides a detailed classification and analysis of all errors. In particular, we point out the shortcomings of our approach in its ability to suggest proper candidates to correct the remaining errors.
机译:在本文中,我们报告了我们对Web数据的检测和校正OCR错误的实验。更具体地,我们利用Google搜索来访问可用于识别校正的可能候选的大数据资源。然后,我们使用最长的常用子序列(LCS)和贝叶斯估计的组合来自动选择适当的候选人。我们对一小组历史报纸数据的实验结果分别显示了51%和100%的召回和精度。本文的工作还提供了对所有错误的详细分类和分析。特别是,我们指出了我们的方法的缺点能够建议适当的候选人来纠正剩余错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号