首页> 外文学位 >Novel Word Recognition and Word Spotting Systems for Offline Urdu Handwriting.
【24h】

Novel Word Recognition and Word Spotting Systems for Offline Urdu Handwriting.

机译:用于脱机乌尔都语手写体的新型单词识别和单词发现系统。

获取原文
获取原文并翻译 | 示例

摘要

Word recognition for offline Arabic, Farsi and Urdu handwriting is a subject which has attained much attention in the OCR field. This thesis presents the implementations of offline Urdu Handwritten Word Recognition (HWR) and an Urdu word spotting technique. This thesis first introduces the creation of several offline CENPARMI Urdu databases. These databases were necessary for offline Urdu HWR experiments. The holistic-based recognition approach was followed for the Urdu HWR system. In this system, the basic pre-processing of images was performed. In the feature extraction phase, the gradient and structural features were extracted from greyscale and binary word images, respectively. This recognition system extracted 592 feature sets and these features helped in improving the recognition results. The system was trained and tested on 57 words. Overall, we achieved a 97 % accuracy rate for handwritten word recognition by using the SVM classifier.;In the word spotting algorithm, the candidate words were generated from the segmented connected components. These candidate words were sent to the holistic HWR system, which extracted the features and tried to recognize each image as one of the 57 words. After classification, each image was sent to the verification/rejection phase, which helped in rejecting the maximum number of unseen (raw data) images. Overall, we achieved a 50% word spotting precision at a 70% recall rate.;Our word spotting technique used the holistic HWR system for recognition purposes. This word spotting system consisted of two processes: the segmentation of handwritten connected components and diacritics from Urdu text lines and the word spotting algorithm. A small database of handwritten text pages was created for testing the word spotting system. This database consisted of texts from ten Urdu native speakers. The rule-based segmentation system was applied for segmentation (or extracting) for handwritten Urdu subwords or connected components from text lines. We achieved a 92% correct segmentation rate for 372 text lines.
机译:离线阿拉伯语,波斯语和乌尔都语手写体的单词识别是OCR领域中备受关注的主题。本文提出了离线Urdu手写单词识别(HWR)和Urdu单词发现技术的实现。本文首先介绍了几个离线CENPARMI Urdu数据库的创建。这些数据库是离线Urdu HWR实验所必需的。 Urdu HWR系统采用了基于整体的识别方法。在该系统中,执行了图像的基本预处理。在特征提取阶段,分别从灰度图像和二值词图像中提取梯度和结构特征。该识别系统提取了592个特征集,这些特征有助于改善识别结果。系统接受了57个单词的培训和测试。总体而言,通过使用SVM分类器,我们实现了手写单词识别的97%的准确率。在单词发现算法中,候选单词是由分段的连接组件生成的。这些候选单词被发送到整体HWR系统,该系统提取特征并尝试将每个图像识别为57个单词之一。分类后,每张图像都被发送到验证/拒绝阶段,这有助于拒绝最大数量的未显示(原始数据)图像。总体而言,我们的单词发现精度达到了50%,召回率达到了70%。;我们的单词发现技术将整体HWR系统用于识别目的。这个单词识别系统包括两个过程:从Urdu文本行中分割手写连接的组件和变音符号,以及单词识别算法。创建了一个小的手写文本页面数据库,用于测试单词查找系统。该数据库由来自十位乌尔都语母语人士的文本组成。基于规则的分割系统应用于手写Urdu子词或文本行中连接的组件的分割(或提取)。我们对372个文本行实现了92%的正确分割率。

著录项

  • 作者

    Sagheer, Malik Waqas.;

  • 作者单位

    Concordia University (Canada).;

  • 授予单位 Concordia University (Canada).;
  • 学科 Computer Science.;Language Linguistics.
  • 学位 M.Comp.Sc.
  • 年度 2010
  • 页码 123 p.
  • 总页数 123
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:37:09

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号