首页> 外文会议>International Conference on Frontiers in Handwriting Recognition >Introducing the Boise State Bangla Handwriting Dataset and an Efficient Offline Recognizer of Isolated Bangla Characters
【24h】

Introducing the Boise State Bangla Handwriting Dataset and an Efficient Offline Recognizer of Isolated Bangla Characters

机译:介绍Boise State Bangla手写数据集和孤立的Bangla字符的高效离线识别器

获取原文
获取外文期刊封面目录资料

摘要

This paper presents a publicly accessible Bangla offline handwriting dataset, as well as benchmarking with a simple and robust isolated handwritten character recognition scheme. The dataset is named Boise State Bangla Handwriting Dataset. The dataset contains 2 pages. The first has a 104 word/364 character essay. The essay uses 49 basic characters, all 11 vowel diacritics and 32 high frequency consonant conjuncts. The second page contains 84 isolated units containing all basic characters, numbers, vowel diacritics and several high frequency conjuncts. The initial release is based on the voluntary contribution of 100 different writers. One of the highlights and unique features of this database is that all of its contents are tagged with the associated ground truth information from different component hierarchies, such as characters, words and lines. It is expected to be useful for research on offline Bangla handwriting recognition, particularly with segmentation-based approaches. Furthermore, a basic character recognition method is presented where the features are extracted based on zonal pixel counts, structural strokes and grid points with U-SURF descriptors modeled with bag of features. The highest classification accuracy obtained with an SVM classifier based on a cubic kernel is 95.4% using the isolated characters from the Boise State dataset together with 3 other datasets to ensure the versatility and robustness of this process.
机译:本文介绍了一个公开访问的Bangla离线手写数据集,以及具有简单且坚固的孤立的手写字符识别方案的基准测试。数据集名为Boise State Bangla手写数据集。数据集包含2页。第一个有一个104字/ 364个字符论文。本文使用49个基本角色,所有11个元音变音和32个高频辅音结合。第二页包含84个孤立的单位,包含所有基本字符,数字,元音变音和几个高频结合。初始版本基于100种不同作家的自愿贡献。此数据库的一个亮点和唯一功能之一是,所有内容都与来自不同组件层次结构的相关地面真实信息标记,例如字符,单词和行。预计将有助于研究离线Bangla手写识别,特别是基于分段的方法。此外,介绍了基于与具有特征袋袋子的u-surf描述符的区内像素计数,结构笔划和网格点提取的特征的基本字符识别方法。使用基于立方内核的SVM分类器获得的最高分类精度是95.4%,使用来自博伊西状态数据集的隔离字符以及3个其他数据集,以确保此过程的多功能性和鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号