首页> 外文期刊>ACM transactions on Asian language information processing >Online Handwritten Gurmukhi Strokes Dataset Based on Minimal Set of Words
【24h】

Online Handwritten Gurmukhi Strokes Dataset Based on Minimal Set of Words

机译:基于最小单词集的在线手写古尔穆奇笔画数据集

获取原文
获取原文并翻译 | 示例
       

摘要

The online handwriting data are an integral part of data analysis and classification research, as collected handwritten data offers many challenges to group handwritten stroke classes. The present work has been done for grouping handwritten strokes from the Indic script Gurmukhi. Gurmukhi is the script of the popular and widely spoken language Punjabi. The present work includes development of the dataset of Gurmukhi words in the context of online handwriting recognition for real-life use applications, such as maps navigation. We have collected the data of 100 writers from the largest cities in the Punjab region. The writers' variations, such as writing skill level (beginner, moderate, and expert), gender, right or left handedness, and their adaptability to digital handwriting, have been considered in dataset development. We have introduced a novel technique to form handwritten stroke classes based on a limited set of words. The presence of all alphabets including vowels of Gurmukhi script has been considered before selection of a word. The developed dataset includes 39,411 strokes from handwritten words and forms 72 classes of strokes after using a k-means clustering technique and manual verification through expert and moderate writers. We have achieved recognition results using the Hidden Markov Model as 87.10%, 85.43%, and 84.33% for middle zone strokes when using training data as 66%, 50%, and 80% of the developed dataset. The present work is a step in a direction to find groups for unknown handwriting strokes with reasonably higher levels of accuracy.
机译:在线手写数据是数据分析和分类研究不可或缺的一部分,因为收集的手写数据给分组手写笔划类带来了许多挑战。目前的工作已经完成,用于对印度文字Gurmukhi中的手写笔划进行分组。古尔穆希语是流行且广泛使用的旁遮普语的脚本。目前的工作包括在在线手写识别的背景下开发古鲁米奇语单词的数据集,以用于现实生活中的应用程序,例如地图导航。我们收集了旁遮普地区最大城市的100位作家的数据。在数据集开发中已经考虑了作者的变化,例如写作技巧水平(初学者,中级和专家),性别,右手或左手习惯以及他们对数字手写的适应性。我们介绍了一种新颖的技术,可以基于一组有限的单词来构成手写笔画类。在选择一个单词之前,已经考虑过所有字母的存在,包括古尔穆希语字母的元音。所开发的数据集包括来自手写单词的39,411个笔划,并使用k-means聚类技术并通过专家和中度作者的手动验证后形成72类笔划。当使用训练数据分别占已开发数据集的66%,50%和80%时,我们使用Hidden Markov模型获得了针对中间区域笔划的识别结果,分别为87.10%,85.43%和84.33%。当前的工作是朝着找到具有合理较高水平准确性的未知笔画笔划的方向迈出的一步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号