首页> 外文期刊>International Journal of Advanced Information Technology >Dimension Reduction for Script Classification - Printed Indian Documents
【24h】

Dimension Reduction for Script Classification - Printed Indian Documents

机译:减少文字分类的尺寸-印制印度文件

获取原文
       

摘要

Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. This paper provides a comparison study of three dimension reduction techniques, namely partial least squares (PLS), sliced inverse regression (SIR) and principal component analysis (PCA), and evaluates the relative performance of classification procedures incorporating those methods. For given script we extracted different features like Gray Level Co-occurrence Method (GLCM) and Scale invariant feature transform (SIFT) features. The features are extracted globally from a given text block which does not require any complex and reliable segmentation of the document image into lines and characters. Extracted features are reduced using various dimension reduction techniques. The reduced features are fed into Nearest Neighbor classifier. Thus the proposed scheme is efficient and can be used for many practical applications which require processing large volumes of data. The scheme has been tested on 10 Indian scripts and found to be robust in the process of scanning and relatively insensitive to change in font size. This proposed system achieves good classification accuracy on a large testing data set.
机译:自动识别给定文档图像中的脚本有助于许多重要应用,例如自动存储多语言文档,搜索文档图像的在线存档以及在多语言环境中选择特定于脚本的OCR。本文提供了三种降维技术的比较研究,即偏最小二乘(PLS),切片逆回归(SIR)和主成分分析(PCA),并评估了结合这些方法的分类程序的相对性能。对于给定的脚本,我们提取了不同的特征,例如灰度共现方法(GLCM)和尺度不变特征变换(SIFT)特征。这些特征是从给定的文本块中全局提取的,不需要将文档图像复杂而可靠地分割为线条和字符。使用各种降维技术可以减少提取的特征。简化后的特征将被馈入最近邻分类器。因此,所提出的方案是有效的,并且可以用于需要处理大量数据的许多实际应用中。该方案已经在10个印度文字上进行了测试,发现在扫描过程中很健壮,并且对字体大小的变化相对不敏感。该建议的系统在大型测试数据集上实现了良好的分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号