首页> 外文期刊>Expert systems with applications >HAH manuscripts: A holistic paradigm for classifying and retrieving historical Arabic handwritten documents
【24h】

HAH manuscripts: A holistic paradigm for classifying and retrieving historical Arabic handwritten documents

机译:HAH手稿:用于分类和检索历史阿拉伯手写文档的整体范例

获取原文
获取原文并翻译 | 示例
       

摘要

Technologies for reading and searching digital documents have helped academic researchers; however, truly effective search engines for handwritten documents have not been developed. Recently, there is a growing need to access historical Arabic handwritten manuscripts (HAH manuscripts) that are stored in large archives; therefore, managing tools for automatic searching, indexing, classifying and retrieval of HAH manuscripts are required. The peculiar characteristics of Arabic handwriting have added an extra challenging dimension in developing such systems. This paper presents a novel holistic technique for classifying and retrieving HAH manuscripts. The classification of HAH manuscripts is performed in several steps. First, the HAH manuscript's image is segmented into words, and then each word is segmented into its connected parts. Due to the existing overlap between the adjacent connected parts of a single word, we developed a stretching algorithm to increase the gap between them and thus improve their segmentation. Second, several structural and statistical features, which are devised for Arabic text, are extracted from these connected parts and then combined to represent a word with one consolidated feature vector. Finally, a neural network is used to learn and classify the input vectors into word classes. These classes are then utilized to retrieve HAH manuscripts. The extraction of structural and statistical features from the individual connected parts, as compared to the extraction of these features from the whole word, improved the performance of the system significantly.
机译:读取和搜索数字文档的技术对学术研究人员有所帮助。但是,尚未开发出真正有效的手写文档搜索引擎。最近,人们越来越需要访问存储在大型档案中的历史阿拉伯手稿(HAH手稿)。因此,需要用于自动搜索,索引,分类和检索HAH手稿的管理工具。阿拉伯手写体的特殊特征在开发此类系统时增加了额外的挑战性维度。本文提出了一种用于分类和检索HAH手稿的全新整体技术。 HAH手稿的分类分几个步骤进行。首先,将HAH手稿的图像切成单词,然后将每个单词切成其相连的部分。由于单个单词的相邻连接部分之间存在重叠,因此我们开发了一种拉伸算法来增加它们之间的间隔,从而改善其切分率。其次,从这些连接的部分中提取针对阿拉伯文本设计的几种结构和统计特征,然后将其组合以表示具有一个合并特征向量的单词。最后,使用神经网络来学习输入向量并将其分类为单词类别。然后利用这些类检索HAH手稿。与从整个单词中提取这些特征相比,从各个连接部分中提取结构和统计特征显着提高了系统的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号