首页> 外文OA文献 >Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.
【2h】

Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.

机译:基于单词的离线手写阿拉伯语分类和识别。利用机器学习方法设计大词汇量离线阿拉伯语手写单词自动识别系统。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The design of a machine which reads unconstrained words still remains an unsolved problem. For example, automatic interpretation of handwritten documents by a computer is still under research. Most systems attempt to segment words into letters and read words one character at a time. However, segmenting handwritten words is very difficult. So to avoid this words are treated as a whole. This research investigates a number of features computed from whole words for the recognition of handwritten words in particular. Arabic text classification and recognition is a complicated process compared to Latin and Chinese text recognition systems. This is due to the nature cursiveness of Arabic text.udThe work presented in this thesis is proposed for word based recognition of handwritten Arabic scripts. This work is divided into three main stages to provide a recognition system. The first stage is the pre-processing, which applies efficient pre-processing methods which are essential for automatic recognition of handwritten documents. In this stage, techniques for detecting baseline and segmenting words in handwritten Arabic text are presented. Then connected components are extracted, and distances between different components are analyzed. The statistical distribution of these distances is then obtained to determine an optimal threshold for word segmentation. The second stage is feature extraction. This stage makes use of the normalized images to extract features that are essential in recognizing the images. Various method of feature extraction are implemented and examined. The third and final stage is the classification. Various classifiers are used for classification such as K nearest neighbour classifier (k-NN), neural network classifier (NN), Hidden Markov models (HMMs), and the Dynamic Bayesian Network (DBN). To test this concept, the particular pattern recognition problem studied is the classification of 32492 words usingudiiudthe IFN/ENIT database. The results were promising and very encouraging in terms of improved baseline detection and word segmentation for further recognition. Moreover, several feature subsets were examined and a best recognition performance of 81.5% is achieved.
机译:读取不受限制的单词的机器的设计仍然是未解决的问题。例如,计算机对手写文档的自动解释仍在研究中。大多数系统试图将单词分割成字母,然后一次读取一个字符。但是,分割手写单词非常困难。因此,避免将此词视为一个整体。这项研究调查了从整个单词中计算出的许多特征,尤其是对于手写单词的识别。与拉丁文和中文文本识别系统相比,阿拉伯语文本分类和识别是一个复杂的过程。这是由于阿拉伯文本的自然草率性。 ud本文中提出的工作被提议用于手写阿拉伯文字的基于单词的识别。这项工作分为三个主要阶段,以提供一个识别系统。第一步是预处理,它应用有效的预处理方法,这些方法对于自动识别手写文档至关重要。在此阶段,将介绍用于检测手写阿拉伯文本中的基准词和分段词的技术。然后提取连接的组件,并分析不同组件之间的距离。然后获得这些距离的统计分布,以确定用于词分割的最佳阈值。第二阶段是特征提取。该阶段利用归一化图像来提取识别图像中必不可少的特征。实现并检查了各种特征提取方法。第三也是最后一个阶段是分类。各种分类器用于分类,例如K最近邻分类器(k-NN),神经网络分类器(NN),隐马尔可夫模型(HMM)和动态贝叶斯网络(DBN)。为了测试该概念,研究的特定模式识别问题是使用 udi udud / IFN / ENIT数据库对32492个单词进行分类。就改进的基线检测和进一步的识别而言,该结果令人鼓舞,令人鼓舞。此外,检查了几个特征子集,并获得了81.5%的最佳识别性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号