Model Based Restoration of Document Images for OCR

机译：基于模型的OCR文档图像恢复

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a methodology for model based restoration of degraded document imagery. The method¬ology has the advantages of being able to adapt to nonuniform page degradations and of being based on a model of image defects that is estimated directly from a set of calibrating degraded document images. Further, unlike other global filtering schemes, our methodology filters only words that have been misspelled by the OCR with a high probability. In the first stage of the process, we extract a training sample of candidate misspelled word subimages from the set of calibration images before and after the degradation that we wish to undo. These word subimages are registered to extract defect pixels. The second stage of our methodology uses a Vector Quantization based algorithm to construct a summary model of the defect pixels. The final stage of the algorithm uses the summary model to restore degraded document images. We evaluate the performance of the methodology for a variety of parameter settings on a real world sample of degraded FAX transmitted documents. The methodology eliminates up to 56.4% of the OCR character errors introduced as a result of FAX transmission for our sample experiment.

著录项

作者
M. Y. Jaisimha; Eve A. Riskin; Richard Ladner;
展开▼
作者单位

展开▼
年度 1996
页码 1-12
总页数 12
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Document Image Restoration - Optical Character Recognition - Filtering Vector Quantization;

机译：文档图像恢复 - 光学字符识别 - 滤波矢量量化;

相似文献

外文文献
中文文献
专利

1. Model-based information extraction method tolerant of OCR errors for document images [J] . Yasuto Ishitani, Toshihiro Nakamura 電子情報通信学会技術研究報告. 言語理解とコミュニケーション. Natural Language Understanding and Models of Communication . 2001,第711期

机译：容忍文档图像OCR错误的基于模型的信息提取方法
2. Model-based information extraction method tolerant of OCR errors for document images [J] . Yasuto Ishitani, Toshihiro Nakamura 電子情報通信学会技術研究報告. パターン認識·メディア理解. Pattern Recognition and Media Understanding . 2001,第712期

机译：容忍文档图像OCR错误的基于模型的信息提取方法
3. Model-based information extraction method tolerant of OCR errors for document images [J] . Yasuto Ishitani, Toshihiro Nakamura 電子情報通信学会技術研究報告. パターン認識·メディア理解. Pattern Recognition and Media Understanding . 2001,第712期

机译：基于模型的信息提取方法容忍文档图像的OCR错误
4. Model-based restoration of document images for OCR [C] . Mysore Y. Jaisimha, MathSoft, Inc., Document Recognition III . 1996

机译：基于模型的OCR文档图像恢复
5. Identification and restoration of images based on overall modeling of the imaging process. [D] . Pavlovic, Gordana Miroslav. 1991

机译：基于成像过程的整体建模来识别和还原图像。
6. Restoration of Motion-Blurred Image Based on Border Deformation Detection: A Traffic Sign Restoration Model [O] . Yiliang Zeng, Jinhui Lan, Bin Ran, -1

机译：基于边界变形检测的运动模糊图像恢复：交通标志恢复模型
7. Model-based Iterative Restoration for Binary Document Image Compression with Dictionary Learning [O] . Guo, Yandong, Lu, Cheng, Allebach, Jan P., 2017

机译：基于模型的二值文档图像压缩迭代恢复与词典学习
8. Image-Restoration Technique Based on a Model of the Image-Formation Process [R] . Myers, G. K. 1981

机译：基于成像过程模型的图像复原技术

Model Based Restoration of Document Images for OCR

摘要

著录项

相似文献

相关主题

期刊订阅