首页> 外文期刊>Sadhana >Devanagari ancient documents recognition using statistical feature extraction techniques
【24h】

Devanagari ancient documents recognition using statistical feature extraction techniques

机译:运用统计特征提取技术识别梵文古代文献

获取原文
           

摘要

Devanagari ancient document recognition process is drawing a lot of consideration from researchers nowadays. These ancient documents contain a wealth of knowledge. However, these documents are not available to all because of their fragile condition. A Devanagari ancient manuscript recognition system isdesigned for digital archiving. This system includes image binarization, character segmentation and recognition phases. It incorporates automatic recognition of scanned and segmented characters. Segmented characters mayinclude basic characters (vowels and consonants), modifiers (matras) and various compound characters (characters formed by joining more than one basic characters). In this paper, handwritten Devanagari ancient manuscripts recognition system has been presented using statistical features extraction techniques. In feature extraction phase, intersection points, open endpoints, centroid, horizontal peak extent and vertical peak extent features are extracted. For classification, Convolutional Neural Network, Neural Network, Multilayer Perceptron,RBF-SVM and random forest techniques are considered in this work. Various feature extraction and classification techniques are considered and compared to the recognition of basic characters segmented from Devanagari ancient manuscripts. A data set, of 6152 pre-segmented samples of Devanagari ancient documents, is considered for experimental work. Authors have achieved 88.95% recognition accuracy using a combination of all features and a combination of all classifiers considered in this work by a simple majority voting scheme.
机译:梵文古代文献识别过程如今引起了研究人员的广泛考虑。这些古代文献包含丰富的知识。但是,由于条件脆弱,因此并非所有人都能获得这些文件。设计了一种Devanagari古代手稿识别系统,用于数字存档。该系统包括图像二值化,字符分割和识别阶段。它具有对扫描和分段字符的自动识别功能。分段字符可以包括基本字符(元音和辅音),修饰符(matras)和各种复合字符(通过连接多个以上基本字符形成的字符)。本文利用统计特征提取技术提出了手写梵文古代手稿识别系统。在特征提取阶段,将提取交点,开放端点,形心,水平峰范围和垂直峰范围特征。对于分类,本文考虑了卷积神经网络,神经网络,多层感知器,RBF-SVM和随机森林技术。考虑了各种特征提取和分类技术,并将其与从梵文字典中分割的基本字符的识别进行比较。实验工作考虑了一个数据集,该数据集包含6152个Devanagari古代文献的预分段样本。作者通过简单的多数表决方案结合了所有功能和本文中考虑的所有分类器,实现了88.95%的识别精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号