首页> 外文期刊>International Journal of Image, Graphics and Signal Processing >Convolution Based Technique for Indic Script Identification from Handwritten Document Images
【24h】

Convolution Based Technique for Indic Script Identification from Handwritten Document Images

机译:基于卷积的手写文档图像印度文字识别技术

获取原文
           

摘要

Determination of script type of document image is a complex real life problem for a multi-script country like India, where 23 official languages (including English) are present and 13 different scripts are used to write them. Including English and Roman those count become 23 and 13 respectively. The problem becomes more challenging when handwritten documents are considered. In this paper an approach for identifying the script type of handwritten document images written by any one of the Bangla, Devnagari, Roman and Urdu script is proposed. Two convolution based techniques, namely Gabor filter and Morphological reconstruction are combined and a feature vector of 20 dimensions is constructed. Due to unavailability of a standard data set, a corpus of 157 document images with an almost equal ratio of four types of script is prepared. During classification the dataset is divided into 2:1 ratio. An average identification accuracy rate of 94.4% is obtained on the test set. The average Bi-script and Tri-script identification accuracy rate was found to be 98.2% and 97.5% respectively. Statistical performance analysis is done using different well known classifiers.
机译:对于像印度这样的多脚本国家来说,确定文档图像的脚本类型是一个复杂的现实生活问题,该国使用23种正式语言(包括英语),并且使用13种不同的脚本来编写它们。包括英语和罗马在内,这两个数字分别变为23和13。当考虑手写文档时,该问题变得更具挑战性。在本文中,提出了一种用于识别由孟加拉,德文加里,罗马和乌尔都语脚本中的任何一个编写的手写文档图像的脚本类型的方法。结合了两种基于卷积的技术,即Gabor滤波和形态重构,并构造了20维的特征向量。由于没有标准数据集,因此准备了157种文档图像的语料库,其中四种类型的脚本几乎相等。在分类过程中,数据集分为2:1的比例。在测试仪上获得的平均识别准确率为94.4%。发现Bi-script和Tri-script的平均识别准确率分别为98.2%和97.5%。使用不同的众所周知的分类器进行统计性能分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号