首页> 外文会议>International Conference on Signal and Image Processing >A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images
【24h】

A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images

机译:一种低分辨率显示板图像中从卡纳达语文本中提取行,单词和字符的鲁棒分割技术

获取原文

摘要

Reliable extraction/segmentation of text lines, words and characters is one of the very important steps for development of automated systems for understanding the text in low resolution display board images. In this paper, a new approach for segmentation of text lines, words and characters from Kannada text in low resolution display board images is presented. The proposed method uses projection profile features and on pixel distribution statistics for segmentation of text lines. The method also detects text lines containing consonant modifiers and merges them with corresponding text lines, and efficiently separates overlapped text lines as well. The character extraction process computes character boundaries using vertical profile features for extracting character images from every text line. Further, the word segmentation process uses k-means clustering to group inter character gaps into character and word cluster spaces, which are used to compute thresholds for extracting words. The method also takes care of variations in character and word gaps. The proposed methodology is evaluated on a data set of 1008 low resolution images of display boards containing Kannada text captured from 2 mega pixel cameras on mobile phones at various sizes 240x320, 600x800 and 900x1200. The method achieves text line segmentation accuracy of 97.17%, word segmentation accuracy of 97.54% and character extraction accuracy of 99.09%. The proposed method is tolerant to font variability, spacing variations between characters and words, absence of free segmentation path due to consonant and vowel modifiers, noise and other degradations. The experimentation with images containing overlapped text lines has given promising results.
机译:可靠地提取/分割文本行,单词和字符是开发用于理解低分辨率显示板图像中的文本的自动化系统的非常重要的步骤之一。本文提出了一种在低分辨率显示板图像中分割卡纳达语文本的行,单词和字符的新方法。所提出的方法使用投影轮廓特征和关于像素分布统计的文本行分割。该方法还检测包含辅音修饰符的文本行,并将其与相应的文本行合并,并有效地分离重叠的文本行。字符提取过程使用垂直轮廓特征计算字符边界,以从每个文本行中提取字符图像。此外,分词过程使用k均值聚类将字符间的字符间隙分为字符和词簇空间,这些空间用于计算提取词的阈值。该方法还考虑了字符和单词间隙的变化。拟议的方法是在显示板的1008张低分辨率图像数据集上进行评估的,该显示板包含从240x320、600x800和900x1200各种尺寸的手机上的2百万像素相机捕获的卡纳达语文本。该方法实现了文本线分割精度为97.17%,单词分割精度为97.54%,字符提取精度为99.09%。所提出的方法可以忍受字体的可变性,字符和单词之间的间距变化,由于辅音和元音修饰符而没有自由的分割路径,噪声和其他劣化。使用包含重叠文本行的图像进行的实验已获得了可喜的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号