Error Correction vs. Query Garbling for Arabic OCR Document Retrieval

KAREEM DARWISH; WALID MAGDY

首页> 外文期刊>ACM Transactions on Information Systems >Error Correction vs. Query Garbling for Arabic OCR Document Retrieval

【24h】

Error Correction vs. Query Garbling for Arabic OCR Document Retrieval

机译：阿拉伯OCR文档检索的纠错与查询伪装

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Due to the existence of large numbers of legacy documents (such as old books and newspapers), improving retrieval effectiveness for OCR'ed documents continues to be an important problem. This article compares the effect of OCR error correction with and without language modeling and the effect of query garbling with weighted structured queries on the retrieval of OCR degraded Arabic documents. The results suggest that moderate error correction does not yield statistically significant improvement in retrieval effectiveness when indexing and searching using n-grams. Also, reversing error correction models to perform query garbling in conjunction with weighted structured queries yields improved retrieval effectiveness. Lastly, using very good error correction that utilizes language modeling yields the best improvement in retrieval effectiveness.

机译：由于存在大量遗留文档（例如旧书和报纸），因此提高OCR文档的检索效率仍然是一个重要问题。本文比较了使用和不使用语言建模的OCR错误纠正的效果，以及使用加权结构化查询进行的查询乱码对OCR降级的阿拉伯文档的检索效果。结果表明，当使用n-gram进行索引和搜索时，适度的错误校正不会在统计学上显着提高检索效率。同样，反向错误纠正模型与加权结构化查询一起执行查询垃圾收集，可以提高检索效率。最后，使用利用语言建模的非常好的错误校正可以最大程度地提高检索效率。

著录项

来源
《ACM Transactions on Information Systems》 |2008年第1期|p.5.1-5.14|共14页
作者
KAREEM DARWISH; WALID MAGDY;
展开▼
作者单位

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Effect of OCR error correction on Arabic retrieval [J] . Walid Magdy, Kareem Darwish Information retrieval . 2008,第5期

机译：OCR纠错对阿拉伯文检索的影响
2. Combination Approaches in Korean Information Retrieval: Words vs. n-grams, and Query Translation vs. Document Translation [J] . IN-SU KANG, SEUNG-HOON NA, JONG-HYEOK LEE International Journal of Computer Processing of Oriental Languages . 2006,第2a3期

机译：朝鲜语信息检索中的组合方法：单词与n-gram，查询翻译与文档翻译
3. Enhanced Arabic Document Retrieval Using Optimized Query Paraphrasing [J] . Abeer Al-Dayel, Mourad Ykhlef Arabian Journal for Science and Engineering . 2015,第11期

机译：使用优化的查询释义增强阿拉伯文文档检索
4. Word-Based Correction for Retrieval of Arabic OCR Degraded Documents [C] . Walid Magdy, Kareem Darwish String Processing and Information Retrieval; Lecture Notes in Computer Science; 4209 . 2006

机译：基于单词的阿拉伯OCR降级文档的检索更正
5. Utilizing big data in identification and correction of OCR errors. [D] . Agarwal, Shivam. 2013

机译：利用大数据识别和纠正OCR错误。
6. Towards Mobile OCR: How To Take a Good Picture of a Document Without Sight [O] . Michael Cutter, Roberto Manduchi -1

机译：迈向移动OCR：如何在无视的情况下对文档进行良好的拍摄
7. Arabic OCR Error Correction Using Character Segment Correction, Language Modeling, and Shallow Morphology [O] . 2008

机译：使用字符段校正，语言建模和浅层形态的阿拉伯语OCR纠错
8. Arabic Optical Character Recognition (OCR) Evaluation in Order to Develop a Post-OCR Module [R] . Kjersten, B. 2011

机译：阿拉伯语光学字符识别（OCR）评估，以开发后OCR模块

Error Correction vs. Query Garbling for Arabic OCR Document Retrieval

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅