Utilizing big data in identification and correction of OCR errors.

机译：利用大数据识别和纠正OCR错误。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this thesis, we report on our experiments for detection and correction of OCR errors with web data. More specifically, we utilize Google search to access the big data resources available to identify possible candidates for correction. We then use a combination of the Longest Common Subsequences (LCS) and Bayesian estimates to automatically pick the proper candidate.;Our experimental results on a small set of historical newspaper data show a recall and precision of 51% and 100%, respectively. The work in this thesis further provides a detailed classification and analysis of all errors. In particular, we point out the shortcomings of our approach in its ability to suggest proper candidates to correct the remaining errors.

机译：在本文中，我们报告了使用Web数据检测和纠正OCR错误的实验。更具体地说，我们利用Google搜索来访问可用的大数据资源，以识别可能的校正对象。然后，我们使用最长公共子序列（LCS）和贝叶斯估计的组合来自动选择合适的候选者。我们在少量历史报纸数据上的实验结果显示，召回率和准确度分别为51％和100％。本文的工作进一步提供了所有错误的详细分类和分析。特别是，我们指出了我们的方法在建议合适的候选人来纠正剩余错误方面的缺点。

著录项

作者
Agarwal, Shivam.;
展开▼
作者单位

University of Nevada, Las Vegas.;

展开▼
授予单位 University of Nevada, Las Vegas.;
学科 Computer science.;Web studies.;Information technology.
学位 M.S.C.S.
年度 2013
页码 63 p.
总页数 63
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Correction: Koutsouris et al. Utilization of Global Precipitation Datasets in Data Limited Regions: A Case Study of Kilombero Valley, Tanzania. Atmosphere , 2017, 8 , 246 [J] . Alexander J. Koutsouris, Jan Seibert, Steve W. Lyon Atmosphere . 2018,第4期

机译：校正：Koutsouris等。在数据有限区域中利用全球降水数据集：以坦桑尼亚的基洛贝罗河谷为例。大气，2017，8，246
2. A drift correction method of E-nose data based on wavelet packet decomposition and no-load data: Case study on the robust identification of Chinese spirits [J] . Wang Yanfang, Yin Yong, Ge Fei, Sensors and Actuators . 2019,第AUGa期

机译：基于小波包分解和空载数据的电子鼻数据漂移校正方法：以中国烈酒的鲁棒辨识为例
3. A drift correction method of E-nose data based on wavelet packet decomposition and no-load data: Case study on the robust identification of Chinese spirits [J] . Wang Yanfang, Yin Yong, Ge Fei, Sensors and Actuators . 2019,第Auga期

机译：基于小波包分解和空载数据的电子鼻数据的漂移校正方法：案例研究中国烈酒的鲁棒识别
4. Utilizing Web Data in Identification and Correction of OCR Errors [C] . Kazem Taghva, Shivam Agarwal Document recognition and retrieval XXI . 2014

机译：利用Web数据识别和纠正OCR错误
5. Utilizing automatic identification tracking systems to compile operational field and structure data. [D] . Majekodunmi, Muyinat O. 2014

机译：利用自动识别跟踪系统来编译作业现场和结构数据。
6. Resource Utilization in the First 2 Years Following Operative Correction for Tetralogy of Fallot: Study Using Data From the Optums De‐Identified Clinformatics Data Mart Insurance Claims Database [O] . Michael L. OByrne, Grace DeCost, Hannah Katcoff, 2020

机译：在对Tetrougogy的前2年后的资源利用：使用来自Optum的De-Identified Clarmomatics数据MART保险索赔数据库的数据进行研究
7. Utilizing Big Data in Identification and Correction of OCR Errors [O] . Agarwal Shivam 2013

机译：利用大数据识别和纠正OCR错误

Utilizing big data in identification and correction of OCR errors.

摘要

著录项

相似文献

相关主题

期刊订阅