...
首页> 外文期刊>International Journal on Document Analysis and Recognition (IJDAR) >IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach
【24h】

IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach

机译:IESK-ArDB:手写阿拉伯文数据库和优化的拓扑分割方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Even though a lot of researches have been conducted in order to solve the problem of unconstrained handwriting recognition, an effective solution is still a serious challenge. In this article, we address two Arabic handwriting recognition-related issues. Firstly, we present IESK-arDB, a new multi-propose off-line Arabic handwritten database. It is publicly available and contains more than 4,000 word images, each equipped with binary version, thinned version as well as a ground truth information stored in separate XML file. Additionally, it contains around 6,000 character images segmented from the database. A letter frequency analysis showed that the database exhibits letter frequencies similar to that of large corpora of digital text, which proof the database usefulness. Secondly, we proposed a multi-phase segmentation approach that starts by detecting and resolving sub-word overlaps, then hypothesizing a large number of segmentation points that are later reduced by a set of heuristic rules. The proposed approach has been successfully tested on IESK-arDB. The results were very promising, indicating the efficiency of the suggested approach.
机译:尽管已经进行了许多研究以解决无限制的手写识别的问题,但是有效的解决方案仍然是严峻的挑战。在本文中,我们解决了两个与阿拉伯文手写识别相关的问题。首先,我们介绍IESK-arDB,这是一个新的多方案离线阿拉伯语手写数据库。它是公开可用的,包含4,000多个单词图像,每个图像都配有二进制版本,细化版本以及存储在单独XML文件中的基本信息。此外,它还包含从数据库中分割的大约6,000个字符图像。字母频率分析表明,数据库显示的字母频率与大型数字文本语料库相似,这证明了数据库的实用性。其次,我们提出了一种多阶段分割方法,该方法首先检测并解决子词重叠,然后假设大量分割点,然后通过一系列启发式规则将其减少。提议的方法已在IESK-arDB上成功测试。结果非常有希望,表明了所建议方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号