...
首页> 外文期刊>Journal of electronic imaging >Contour-based character segmentation for printed Arabic text with diacritics
【24h】

Contour-based character segmentation for printed Arabic text with diacritics

机译:基于轮廓的字符分割,用于带变音符号的印刷阿拉伯文本

获取原文
获取原文并翻译 | 示例
           

摘要

Current developments in sensors open new possible uses across numerous real-life applications, including optical character recognition (OCR). An OCR system requires incorporation of text processing tools into the sensor functionality. The most critical stage in OCR systems is the segmentation stage. It refers to the challenge of subdividing a text image into characters, which can be individually processed using a classifier. The cursive nature of the Arabic script such as the existence of different shapes for each character according to its location in the word besides the existence of diacritics makes Arabic character segmentation a very challenging task. A robust offline character segmentation algorithm for printed Arabic text with diacritics is developed based on the contour extraction technique. The algorithm works through extracting the up-contour part of a word and then identifies the splitting areas of the word characters. Then a postprocessing stage is used to handle the oversegmentation problems that appear in the initial segmentation stage. The proposed scheme is benchmarked using the APTI dataset and a manually collected dataset consisting of image texts varying in font size, type, and style for more than 38,000 words. The experiments show that the proposed algorithm is able to segment Arabic words with diacritics with an average accuracy of 98.5%. (C) 2019 SPIE and IS&T
机译:传感器的最新发展为包括光学字符识别(OCR)在内的众多现实应用提供了可能的新用途。 OCR系统需要将文本处理工具合并到传感器功能中。 OCR系统中最关键的阶段是分段阶段。它涉及将文本图像细分为字符的挑战,可以使用分类器对其进行单独处理。阿拉伯文字的草书性质,例如,除了存在变音符号外,根据每个字符在单词中的位置,每个字符还存在不同的形状,这使阿拉伯字符分割成为一项非常具有挑战性的任务。基于轮廓提取技术,开发了一种带有变音符号的鲁棒离线文字分割算法。该算法通过提取单词的上轮廓部分然后识别单词字符的分割区域来工作。然后,使用后处理阶段来处理出现在初始分段阶段的过度分段问题。该方案使用APTI数据集和手动收集的数据集进行基准测试,该数据集由字体大小,类型和样式不同的图像文本组成,包含38,000多个单词。实验表明,该算法能够对带有变音符号的阿拉伯单词进行分割,平均准确率达到98.5%。 (C)2019 SPIE和IS&T

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号