首页> 外文期刊>ACM Transactions on Information Systems >Error Correction vs. Query Garbling for Arabic OCR Document Retrieval
【24h】

Error Correction vs. Query Garbling for Arabic OCR Document Retrieval

机译:阿拉伯OCR文档检索的纠错与查询伪装

获取原文
获取原文并翻译 | 示例

摘要

Due to the existence of large numbers of legacy documents (such as old books and newspapers), improving retrieval effectiveness for OCR'ed documents continues to be an important problem. This article compares the effect of OCR error correction with and without language modeling and the effect of query garbling with weighted structured queries on the retrieval of OCR degraded Arabic documents. The results suggest that moderate error correction does not yield statistically significant improvement in retrieval effectiveness when indexing and searching using n-grams. Also, reversing error correction models to perform query garbling in conjunction with weighted structured queries yields improved retrieval effectiveness. Lastly, using very good error correction that utilizes language modeling yields the best improvement in retrieval effectiveness.
机译:由于存在大量遗留文档(例如旧书和报纸),因此提高OCR文档的检索效率仍然是一个重要问题。本文比较了使用和不使用语言建模的OCR错误纠正的效果,以及使用加权结构化查询进行的查询乱码对OCR降级的阿拉伯文档的检索效果。结果表明,当使用n-gram进行索引和搜索时,适度的错误校正不会在统计学上显着提高检索效率。同样,反向错误纠正模型与加权结构化查询一起执行查询垃圾收集,可以提高检索效率。最后,使用利用语言建模的非常好的错误校正可以最大程度地提高检索效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号