首页> 外文会议>IET International Conference on Visual Information Engineering >WORD-WISE SCRIPT IDENTIFICATION BASED ON MORPHOLOGICAL RECONSTRUCTION IN PRINTED BILINGUAL DOCUMENTS
【24h】

WORD-WISE SCRIPT IDENTIFICATION BASED ON MORPHOLOGICAL RECONSTRUCTION IN PRINTED BILINGUAL DOCUMENTS

机译:基于印刷双语文献形态重建的词语脚本识别

获取原文

摘要

Owing to the diversity of languages and script, English has proven to be the binding language in India. So, a line of a bilingual document page may contain text words in regional language and numerals in English. For Optical Character Recognition (OCR) of such a document page it is necessary to identify script forms before running individual OCR of the scripts. In this paper an automatic technique for script identification at word level based on morphological reconstruction is proposed for two printed bilingual documents of Telugu and Devnagari containing English numerals as the common script. The technique developed includes a feature extractor and the classifiers. The feature extractor consists of two stages. In the first stage, morphological erosion and opening by reconstruction is carried out on a document image in horizontal and vertical directions using the line structuring element. The length of the structuring element is fixed based on the average height of all the connected components of an image. In the next stage, average pixel distribution is found in these resulting images. The nearest neighbor and k-nearest neighbor algorithms are used to classify new word images. The proposed algorithm is tested on 1500 sample words with various font styles and sizes. The results obtained are quite encouraging.
机译:由于语言和脚本的多样性,英语已被证明是印度的约束力。因此,一行双语文档页面可能包含区域语言和英语数字中的文本单词。对于这样的文档页面的光学字符识别(OCR),必须在运行脚本的单个OCR之前识别脚本表单。本文提出了一种基于形态重建的字级别的脚本识别的自动技术,为两个印刷双语文献,泰卢语和vidnagari包含英文数字作为公共脚本。该技术开发包括特征提取器和分类器。特征提取器由两个阶段组成。在第一阶段,通过使用线结构元件在水平和垂直方向的文档图像上进行形态腐蚀和开口。结构元件的长度基于图像的所有连接组件的平均高度固定。在下一阶段,在这些结果图像中找到平均像素分布。最近的邻居和k最近邻算法用于对新字图像进行分类。在具有各种字体样式和大小的三个样本单词上测试了所提出的算法。获得的结果非常令人鼓舞。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号