首页> 外文会议>International Conference on Data Networks, Communications, Computers >A Benchmark Dataset for Devnagari Document Recognition Research
【24h】

A Benchmark Dataset for Devnagari Document Recognition Research

机译:Devnagari文档识别研究的基准数据集

获取原文

摘要

A benchmark dataset is required for the development of an efficient and a reliable recognition system. Unfortunately, no comprehensive benchmark dataset exists for handwritten Devnagari optical document recognition research, at least in the public domain. This paper is an effort in this direction. In here, we introduce a comprehensive dataset that we referred to as CPAR-2012 dataset, for such benchmark studies, also present some preliminary recognition results. The dataset includes 35,000 isolated handwritten numerals, 83,300 characters, 2,000 constrained and 2,000 unconstrained handwritten pangrams. It is organized in a relational data model that contains text images along with their writer's information and related handwriting attributes. We collected the handwriting samples from 2,000 subjects who were chosen from different age, ethnicity, and educational background, regional and linguistic groups. The samples reflect expected variations in Devnagari handwriting. The digit recognition results using recognition schemes that uses simple most features & four neural network classifiers & KNN, and classifier ensemble have also been reported for benchmarking.
机译:开发有效和可靠的识别系统需要基准数据集。遗憾的是,至少在公共领域中,不存在手写Devnagari光学文档识别研究的全面的基准数据集。本文朝着这个方向努力。在这里,我们介绍了一个全面的数据集,我们称之为CPar-2012数据集,对于此类基准研究,也提出了一些初步识别结果。数据集包括35,000个孤立的手写数字,83,300个字符,2,000个受约束和2,000个无约束的手写掌握。它是在一个关系数据模型中组织,其中包含文本图像以及其作者的信息和相关的手写属性。我们收集了从不同年龄,种族和教育背景,区域和语言群体中选择的2,000名科目的手写样本。样品反映了Devnagari手写的预期变化。还据报道了使用使用简单的特征和四个神经网络分类器和knn以及分类器集合的识别方案的数字识别结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号