首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >New Morphological Markovian Approach for Analysis and Recognition of Open Arabic Canonical Vocabulary
【24h】

New Morphological Markovian Approach for Analysis and Recognition of Open Arabic Canonical Vocabulary

机译:分析和识别开放阿拉伯规范词汇的新形态马尔可夫方法

获取原文

摘要

Arabic writing recognition has been a tough challenge due to its flexional nature and great topological variability. For that, we have been continuing investigating the use of linguistic knowledge to improve the recognition of wide then open Arabic word lexicon. In this paper, we propose a new approach for the recognition of decomposable Arabic words by the use of planar (i.e. bi-dimensional) Markovian models that embody two crucial aspects of Arabic: language morphology and script topology. The contribution here aims to extend the target from wide lexicon recognition toward open lexicon one. Indeed, for the training, we use planar hidden Markov models; each is dedicated to learn a sub-vocabulary derived from one root. For the recognition, we use already trained models, for non-learned words, we instantly created a new model which is able to recognize words derived from new intruder (non-trained) roots. Preliminary experiments were conducted on a corpus of about 3000 samples of Arabic words and yielded promising results.
机译:阿拉伯文字的识别能力由于其柔韧性和极大的拓扑变异性而一直是一个艰巨的挑战。为此,我们一直在继续研究使用语言知识来提高对广泛然后开放的阿拉伯词词典的认识。在本文中,我们提出了一种通过使用平面(即二维)马尔可夫模型来识别可分解阿拉伯语单词的新方法,该模型体现了阿拉伯语的两个关键方面:语言形态学和脚本拓扑学。这里的贡献旨在将目标从广泛的词典识别扩展到开放的词典之一。确实,在训练中,我们使用了平面隐马尔可夫模型;每个人都致力于学习源自一个词根的子词汇。为了进行识别,我们使用已经训练的模型,对于未学习的单词,我们立即创建了一个新模型,该模型能够识别从新的入侵者(未训练)词根中提取的单词。在约3000个阿拉伯语单词样本的语料库上进行了初步实验,并产生了可喜的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号