首页> 外文会议>International Conference on Signal and Image Processing >A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images
【24h】

A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images

机译:来自低分辨率显示板图像中的Kannada文本的线,Word和字符提取的强大分割技术

获取原文

摘要

Reliable extraction/segmentation of text lines, words and characters is one of the very important steps for development of automated systems for understanding the text in low resolution display board images. In this paper, a new approach for segmentation of text lines, words and characters from Kannada text in low resolution display board images is presented. The proposed method uses projection profile features and on pixel distribution statistics for segmentation of text lines. The method also detects text lines containing consonant modifiers and merges them with corresponding text lines, and efficiently separates overlapped text lines as well. The character extraction process computes character boundaries using vertical profile features for extracting character images from every text line. Further, the word segmentation process uses k-means clustering to group inter character gaps into character and word cluster spaces, which are used to compute thresholds for extracting words. The method also takes care of variations in character and word gaps. The proposed methodology is evaluated on a data set of 1008 low resolution images of display boards containing Kannada text captured from 2 mega pixel cameras on mobile phones at various sizes 240x320, 600x800 and 900x1200. The method achieves text line segmentation accuracy of 97.17%, word segmentation accuracy of 97.54% and character extraction accuracy of 99.09%. The proposed method is tolerant to font variability, spacing variations between characters and words, absence of free segmentation path due to consonant and vowel modifiers, noise and other degradations. The experimentation with images containing overlapped text lines has given promising results.
机译:文本线条,单词和字符的可靠提取/分割是用于了解低分辨率显示板图像中文本的自动化系统的非常重要的步骤之一。在本文中,提出了一种新方法,用于在低分辨率显示板图像中从kannada文本中分割文本线条,单词和字符。该方法使用投影型材特征和在像素分布统计上进行文本线的分割。该方法还检测包含辅音修饰符的文本线,并将它们与相应的文本行合并,并有效地分隔重叠的文本线。字符提取过程使用垂直轮廓特征计算字符边界,用于从每个文本线中提取字符图像。此外,单词分割过程使用k-means群集到组间字符间隙中的字符和单词簇空间,这些空间用于计算用于提取单词的阈值。该方法还负责特征和单词间隙的变化。所提出的方法是在包含在各种尺寸240x320,600x800和900x1200的移动电话上的kannada文本的显示板的1008个低分辨率图像的数据集的数据集。该方法达到了97.17%的文本线分割精度,字分割精度为97.54%,性格提取精度为99.09%。所提出的方法是容忍字体变异性,特征和单词之间的间距变化,由于辅音和元音改性剂,噪声和其他降级而没有自由分割路径。具有包含重叠文本线的图像的实验已经给出了有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号