首页> 外文学位 >Feature design and lexicon reduction for efficient offline handwriting recognition.
【24h】

Feature design and lexicon reduction for efficient offline handwriting recognition.

机译:功能设计和词典缩减功能可实现高效的离线手写识别。

获取原文
获取原文并翻译 | 示例

摘要

This thesis establishes a pattern recognition framework for offline word recognition systems. It focuses on the image level features because they greatly influence the recognition performance. In particular, we consider two complementary aspects of prominent features impact: lexicon reduction and the actual recognition. The first aspect, lexicon reduction, consists in the design of a weak classifier which outputs a set of candidate word hypotheses given a word image. Its main purpose is to reduce the recognition computational time while maintaining (or even improving) the recognition rate. The second aspect is the actual recognition system itself. In fact, several features exist in the literature based on different fields of research, but no consensus exists concerning the most promising ones. The goal of the proposed framework is to improve our understanding of relevant features in order to build better recognition systems. For this purpose, we addressed two specific problems: 1) feature design for lexicon reduction (application to Arabic script), and 2) feature evaluation for cursive handwriting recognition (application to Latin and Arabic scripts).;Few methods exist for lexicon reduction in Arabic script, unlike Latin script. Existing methods use salient features of Arabic words such as the number of subwords and diacritics, but totally ignore the shape of the subwords. Therefore, our first goal is to perform lexicon reduction based on subwords shape. Our approach is based on shape indexing, where the shape of a query subword is compared to a labeled database of sample subwords. For efficient comparison with a low computational overhead, we proposed the weighted topological signature vector (W-TSV) framework, where the subword shape is modeled as a weighted directed acyclic graph (DAG) from which the W-TSV vector is extracted for efficient indexing. The main contributions of this work are to extend the existing TSV framework to weighted DAG and to propose a shape indexing approach for lexicon reduction. Good performance for lexicon reduction is achieved for Arabic subwords. Nevertheless, the performance remains modest for Arabic words.;Considering the results of our first work on Arabic lexicon reduction, we propose to build a new index for better performance at the word level. The subword shape and the number of subwords and diacritics are all important components of Arabic word shape. We therefore propose the Arabic word descriptor (AWD) which integrates all the aforementioned components. It is built in two steps. First, a structural descriptor (SD) is computed for each connected component (CC) of the word image. It describes the CC shape using the bag-of-words model, where each visual word represents a different local shape structure. Then, the AWD is formed by concatenating the SDs using an efficient heuristic, implicitly discriminating between subwords and diacritics. In the context of lexicon reduction, the AWD is used to index a reference database. The main contribution of this work is the design of the AWD, which integrates lowlevel cues (subword shape structure) and symbolic information (subword counts and diacritics) into a single descriptor. The proposed method has a low computational overhead, it is simple to implement and it provides state-of-the-art performance for lexicon reduction on two Arabic databases, namely the Ibn Sina database of subwords and the IFN/ENIT database of words.;The last part of this thesis focuses on features for word recognition. A large body of features exist in the literature, each of them being motivated by different fields, such as pattern recognition, computer vision or machine learning. Identifying the most promising approaches would improve the design of the next generation of features. Nevertheless, because they are based on different concepts, it is difficult to compare them on a theoretical ground and efficient empirical tools are needed. Therefore, the last objective of the thesis is to provide a method for feature evaluation that assesses the strength and complementarity of existing features. A combination scheme has been designed for this purpose, in which each feature is evaluated through a reference recognition system, based on recurrent neural networks. More precisely, each feature is represented by an agent, which is an instance of the recognition system trained with that feature. The decisions of all the agents are combined using a weighted vote. The weights are jointly optimized during a training phase in order to increase the weighted vote of the true word label. Therefore, they reflect the strength and complementarity of the agents and their features for the given task. Finally, they are converted into a numerical score assigned to each feature, which is easy to interpret under this combination model. Five state-of-the-art features have been tested, and our results provide interesting insight for future feature design. (Abstract shortened by UMI.).
机译:本文建立了离线词识别系统的模式识别框架。它着重于图像级功能,因为它们会极大地影响识别性能。特别是,我们考虑了突出特征影响的两个互补方面:词典缩减和实际识别。第一个方面,词汇还原,在于弱分类器的设计,该分类器在给定单词图像的情况下输出一组候选单词假设。其主要目的是减少识别的计算时间,同时保持(甚至提高)识别率。第二方面是实际的识别系统本身。实际上,基于不同研究领域的文献中存在几种特征,但对于最有前途的特征尚无共识。拟议框架的目的是增进我们对相关功能的理解,以便建立更好的识别系统。为此,我们解决了两个特定的问题:1)用于缩略词典的特征设计(应用于阿拉伯文字),以及2)用于草书手写识别的特征评估(应用于拉丁文和阿拉伯文字)。阿拉伯文字,不同于拉丁文字。现有方法使用阿拉伯词的显着特征,例如子词的数量和变音符号,但完全忽略了子词的形状。因此,我们的第一个目标是根据子词的形状执行词典还原。我们的方法基于形状索引,其中将查询子词的形状与标记的样本子词数据库进行比较。为了以较低的计算开销进行有效比较,我们提出了加权拓扑特征向量(W-TSV)框架,其中将子字形建模为加权有向无环图(DAG),从中提取W-TSV向量以进行有效索引。这项工作的主要贡献是将现有的TSV框架扩展到加权DAG,并提出了一种用于减少词典的形状索引方法。阿拉伯语子词在词汇表还原方面表现良好。但是,对于阿拉伯语单词,性能仍然不高。考虑到我们有关阿拉伯语词典缩减的第一项工作的结果,我们建议建立一个新的索引以提高单词级别的性能。子词的形状以及子词和变音符号的数量都是阿拉伯语词形的重要组成部分。因此,我们提出了阿拉伯语单词描述符(AWD),其中集成了所有上述组件。它分两步构建。首先,为单词图像的每个连接的分量(CC)计算结构描述符(SD)。它使用词袋模型描述CC形状,其中每个视觉词代表不同的局部形状结构。然后,通过使用有效的启发式方法来串联SD,从而隐式地区分子词和变音符号,从而形成AWD。在减少词典的情况下,AWD用于索引参考数据库。这项工作的主要贡献是AWD的设计,该设计将低级提示(子字形结构)和符号信息(子字数和变音符号)集成到单个描述符中。所提出的方法具有较低的计算开销,易于实现,并且在两个阿拉伯数据库,即子词的伊本·西纳数据库和词的IFN / ENIT数据库,提供了最新的词典还原性能。本文的最后一部分着重于单词识别的功能。文献中存在大量特征,每个特征都受不同领域的推动,例如模式识别,计算机视觉或机器学习。确定最有前途的方法将改善下一代功能的设计。但是,由于它们基于不同的概念,因此很难在理论上进行比较,因此需要有效的经验工具。因此,本文的最后一个目的是提供一种用于特征评估的方法,该方法评估现有特征的强度和互补性。为此,设计了一种组合方案,其中基于递归神经网络,通过参考识别系统评估每个特征。更准确地说,每个功能都由一个代理表示,该代理是使用该功能训练的识别系统的一个实例。所有座席的决定均使用加权投票进行合并。在训练阶段共同优化权重,以增加真实单词标签的加权投票。因此,它们反映了给定任务的代理的力量和互补性及其特征。最后,将它们转换为分配给每个特征的数值分数,在此组合模型下很容易解释。已测试了五个最新功能,我们的结果为将来的功能设计提供了有趣的见解。 (摘要由UMI缩短。)。

著录项

  • 作者

    Chherawala, Youssouf.;

  • 作者单位

    Ecole de Technologie Superieure (Canada).;

  • 授予单位 Ecole de Technologie Superieure (Canada).;
  • 学科 Artificial Intelligence.;Computer Science.
  • 学位 D.Eng.
  • 年度 2014
  • 页码 171 p.
  • 总页数 171
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号