首页> 外文期刊>The international arab journal of information technology >UCOM Offline Dataset-An Urdu Handwritten Dataset Generation
【24h】

UCOM Offline Dataset-An Urdu Handwritten Dataset Generation

机译:UCOM离线数据集-乌尔都语手写数据集生成

获取原文
获取原文并翻译 | 示例
           

摘要

A benchmark database for character recognition is an essential part for efficient and robust development. Unfortunately, there is no comprehensive handwritten dataset for Urdu language that would be used to compare the state of the art techniques in the field of optical character recognition. In this paper, we present a new and publically available dataset comprising 600 pages of handwritten Urdu text written in Nasta'liq style in conjunction with detailed ground truth for the evaluation of handwritten Urdu character recognition. This dataset contains text lines written in Nasta 'lig style by limited individuals on A4 size paper. The acquired data on page was scanned and text lines were segmented. UCOM database covers all Urdu characters and ligatures with different variation in addition to Urdu numeric data. We have considered that ligature consists of up to five characters in this dataset. The UCOM dataset can be used for handwritten character recogntition as well as writer identification. We proposed and evaluated the strength of Recurrent Neural Networks (RNN) on UCOM offline database sample text line.
机译:用于字符识别的基准数据库是有效而强大的开发的重要组成部分。不幸的是,没有用于乌尔都语语言的完整手写数据集可用于比较光学字符识别领域的最新技术。在本文中,我们提出了一个新的公开可用的数据集,其中包括600页以Nasta'liq风格编写的乌尔都语手写文本以及详细的地面真实性,用于评估乌尔都语手写字符识别。该数据集包含由A4尺寸纸张上的有限个人以Nasta'lig样式书写的文本行。扫描页面上获取的数据并分割文本行。除乌尔都语数字数据外,UCOM数据库还涵盖所有具有不同变体的乌尔都语字符和连字。我们认为连字在此数据集中最多包含五个字符。 UCOM数据集可用于手写字符识别以及作者识别。我们在UCOM离线数据库示例文本行上提出并评估了递归神经网络(RNN)的强度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号