...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >KHATT: An open Arabic offline handwritten text database
【24h】

KHATT: An open Arabic offline handwritten text database

机译:KHATT:一个开放的阿拉伯语离线手写文本数据库

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

A comprehensive Arabic handwritten text database is an essential resource for Arabic handwritten text recognition research. This is especially true due to the lack of such database for Arabic handwritten text. In this paper, we report our comprehensive Arabic offline Handwritten Text database (KHATT) consisting of 1000 handwritten forms written by 1000 distinct writers from different countries. The forms were scanned at 200, 300, and 600 dpi resolutions. The database contains 2000 randomly selected paragraphs from 46 sources, 2000 minimal text paragraph covering all the shapes of Arabic characters, and optionally written paragraphs on open subjects. The 2000 random text paragraphs consist of 9327 lines. The database forms were randomly divided into 70%, 15%, and 15% sets for training, testing, and verification, respectively. This enables researchers to use the database and compare their results. A formal verification procedure is implemented to align the handwritten text with its ground truth at the form, paragraph and line levels. The verified ground truth database contains meta-data describing the written text at the page, paragraph, and line levels in text and XML formats. Tools to extract paragraphs from pages and segment paragraphs into lines are developed. In addition we are presenting our experimental results on the database using two classifiers, viz. Hidden Markov Models (HMM) and our novel syntactic classifier. The database is made freely available to researchers world-wide for research in various handwritten-related problems such as text recognition, writer identification and verification, forms analysis, preprocessing, segmentation. Several international research groups/researchers acquired the database for use in their research so far.
机译:全面的阿拉伯语手写文本数据库是阿拉伯语手写文本识别研究的重要资源。由于缺少用于阿拉伯手写文本的数据库,因此尤其如此。在本文中,我们报告了我们全面的阿拉伯语脱机手写文本数据库(KHATT),该数据库包含1000个手写表格,这些表格由来自不同国家的1000位不同的作家撰写。表格以200、300和600 dpi的分辨率进行扫描。该数据库包含从46个来源中随机选择的2000个段落,涵盖所有阿拉伯字符形状的2000个最小文本段落以及关于开放主题的可选书面段落。 2000个随机文本段落包含9327行。数据库形式被随机分为70%,15%和15%,分别用于训练,测试和验证。这使研究人员可以使用数据库并比较其结果。实施了正式的验证程序,以使手写文本与其基本事实在形式,段落和行级别上保持一致。经过验证的基本事实数据库包含描述页面,段落和行级别的文本和XML格式的书面文本的元数据。开发了从页面提取段落并将段落分段成行的工具。此外,我们还将使用两个分类器在数据库中展示我们的实验结果。隐马尔可夫模型(HMM)和我们新颖的句法分类器。该数据库可免费提供给全世界的研究人员,以研究与手写相关的各种问题,例如文本识别,作者识别和验证,表格分析,预处理,分段。到目前为止,已有几个国际研究小组/研究人员获得了该数据库以用于他们的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号