首页> 外文期刊>Journal of Imaging >A Holistic Technique for an Arabic OCR System
【24h】

A Holistic Technique for an Arabic OCR System

机译:阿拉伯OCR系统的整体技术

获取原文
           

摘要

Analytical based approaches in Optical Character Recognition (OCR) systems can endure a significant amount of segmentation errors, especially when dealing with cursive languages such as the Arabic language with frequent overlapping between characters. Holistic based approaches that consider whole words as single units were introduced as an effective approach to avoid such segmentation errors. Still the main challenge for these approaches is their computation complexity, especially when dealing with large vocabulary applications. In this paper, we introduce a computationally efficient, holistic Arabic OCR system. A lexicon reduction approach based on clustering similar shaped words is used to reduce recognition time. Using global word level Discrete Cosine Transform (DCT) based features in combination with local block based features, our proposed approach managed to generalize for new font sizes that were not included in the training data. Evaluation results for the approach using different test sets from modern and historical Arabic books are promising compared with state of art Arabic OCR systems.
机译:光学字符识别(OCR)系统中基于分析的方法可以忍受大量的分割错误,尤其是在处理草书语言(例如阿拉伯语,字符之间经常重叠)时。引入基于整体的方法,将整个单词视为单个单位,这是避免此类分割错误的有效方法。这些方法的主要挑战仍然是它们的计算复杂性,尤其是在处理大型词汇应用程序时。在本文中,我们介绍了一种计算有效的整体阿拉伯OCR系统。使用基于聚类相似形状词的词典减少方法来减少识别时间。通过将基于全局单词级别的离散余弦变换(DCT)的功能与基于局部块的功能相结合,我们提出的方法设法归纳了训练数据中未包含的新字体大小。与最新的阿拉伯语OCR系统相比,使用来自现代和历史阿拉伯语书籍的不同测试集对该方法的评估结果很有希望。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号