The Other C: Correcting OCR Words in the Presence of Diacritical Marks

机译：另一个C：在存在变音标记的情况下更正OCR单词

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a lexicon based method whose purpose is correcting a word recognized by an OCR engine (a classifier). This postprocessing method was originally designed to be used for language models that support diacritical marks, such as Portuguese. Since these special marks can be confused with noise by the classifier, wrong predictions can be derived if only the top hypothesis per glyph of the original image is preserved. To cope with this, our method uses a filtering strategy to select the best hypotheses for each glyph, which are used to produce candidate queries. A best query is selected in terms of confidence rate and edit distance to the word. A similarity search method over the best query suggests a correction. Experiments show the method improves prediction accuracy considerably for Portuguese words correction.

机译：我们提出了一种基于词典的方法，其目的是纠正由OCR引擎（分类器）识别的单词。此后处理方法最初设计为用于支持变音标记的语言模型，例如葡萄牙语。由于分类器会将这些特殊标记与噪声混淆，因此，如果仅保留原始图像的每个字形的最高假设，则可能会得出错误的预测。为了解决这个问题，我们的方法使用过滤策略为每个字形选择最佳假设，这些假设用于产生候选查询。根据置信度和与单词的编辑距离选择最佳查询。针对最佳查询的相似性搜索方法建议进行更正。实验表明，该方法大大提高了葡萄牙语单词校正的预测准确率。

著录项

来源
《International conference on computational processing of portuguese》|2018年|222-230|共9页
会议地点
作者
Sergio Luis Sardi Mergen; Leonardo de Abreu Schmidt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
OCR; Similarity search; Classifier;

机译：OCR;相似度搜索;分类器;

相似文献

外文文献
中文文献
专利

1. Do Diacritical Marks Play a Role at the Early Stages of Word Recognition in Arabic? [J] . Manuel Perea, Reem Abu Mallouh, Ahmed Mohammed, Frontiers in Psychology . 2016,第4期

机译：阿拉伯语单词识别的早期阶段，变音符号是否起作用？
2. Diacritical Language OCR Based on Neural Network: Case of Amazigh Language [J] . Khadija EL Gajoui, Fadoua Ataa Allah, Mohammed Oumsis Procedia Computer Science . 2015,第1期

机译：基于神经网络的变音OCR：以Amazigh语言为例
3. Diacritical Language OCR Based on Neural Network: Case of Amazigh Language [J] . Khadija EL Gajoui, Fadoua Ataa Allah, Mohammed Oumsis Procedia Computer Science . 2015,第1期

机译：基于神经网络的变音OCR：以Amazigh语言为例
4. The Other C: Correcting OCR Words in the Presence of Diacritical Marks [C] . Sergio Luis Sardi Mergen, Leonardo de Abreu Schmidt International Workshop on Computational Processing of the Portuguese Language . 2018

机译：另一个C：在存在变音标记的情况下校正OCR字
5. Curriculum-based measurement: Investigating the relationship between oral and silent reading comprehension and words correct per minute. [D] . Hale, Andrea Dawn. 2005

机译：基于课程的评估：研究口语和无声阅读理解与每分钟正确单词之间的关系。
6. Do Diacritical Marks Play a Role at the Early Stages of Word Recognition in Arabic? [O] . Manuel Perea, Reem Abu Mallouh, Ahmed Mohammed, -1

机译：阿拉伯语单词识别的早期阶段变音符号是否起作用？
7. Czech Words with Diacritical Marks [O] . Tvrdik Stanislav 1982

机译：捷克语与变音符号

The Other C: Correcting OCR Words in the Presence of Diacritical Marks

摘要

著录项

相似文献

相关主题

期刊订阅