...
首页> 外文期刊>International Journal on Document Analysis and Recognition >Nastalique segmentation-based approach for Urdu OCR
【24h】

Nastalique segmentation-based approach for Urdu OCR

机译:基于Nastalique细分的Urdu OCR方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Much work on Arabic language optical character recognition (OCR) has been on Naskh writing style. Nastalique style, used for most of languages using Arabic script across Southern Asia, is much more challenging to process due to its compactness, cursiveness, higher context sensitivity and diagonality. This makes the Nastalique writingmore complex with multiple letters horizontally overlapping each other. Due to these reasons, existing methods used for Naskh would not work for Nastalique and therefore most work on Nastalique has used non-segmentation methods. The current paper presents new approach for segmentation-based analysis for Nastalique style. The paper explains the complexity of Nastalique, why Naskh based techniques cannot work for Nastalique, and proposes a segmentation-based method for developing Nastalique OCR, deriving principles and techniques for the pre-processing and recognition. The OCR is developed for Urdu language. The system is optimized using 79,093 instances of 5249 main bodies derived from a corpus of 18 million words, giving recognition accuracy of 97.11%. The system is then tested on document images of books with 87.44% main body recognition accuracy. The work is extensible to other languages using Nastalique.
机译:关于阿拉伯语光学字符识别(OCR)的许多工作都涉及纳什克的写作风格。在整个南亚,用于大多数使用阿拉伯文字的大多数语言的Nastalique样式由于其紧凑,草率,较高的上下文敏感性和对角线性,在处理过程中更具挑战性。这使得Nastalique的书写更加复杂,因为多个字母在水平方向上相互重叠。由于这些原因,用于Naskh的现有方法不适用于Nastalique,因此有关Nastalique的大多数工作都使用了非分段方法。本文提出了一种基于分割的Nastalique风格分析新方法。本文解释了Nastalique的复杂性,为何基于Naskh的技术无法用于Nastalique,并提出了一种基于分段的方法来开发Nastalique OCR,并推导了预处理和识别的原理和技术。 OCR是为乌尔都语开发的。该系统使用来自1,800万个单词的语料库的5249个主体的79,093个实例进行了优化,识别精度为97.11%。然后,该系统在书本的文档图像上进行了测试,具有87.44%的主体识别精度。可以使用Nastalique将工作扩展到其他语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号