A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images

机译：一种低分辨率显示板图像中从卡纳达语文本中提取行，单词和字符的鲁棒分割技术

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reliable extraction/segmentation of text lines, words and characters is one of the very important steps for development of automated systems for understanding the text in low resolution display board images. In this paper, a new approach for segmentation of text lines, words and characters from Kannada text in low resolution display board images is presented. The proposed method uses projection profile features and on pixel distribution statistics for segmentation of text lines. The method also detects text lines containing consonant modifiers and merges them with corresponding text lines, and efficiently separates overlapped text lines as well. The character extraction process computes character boundaries using vertical profile features for extracting character images from every text line. Further, the word segmentation process uses k-means clustering to group inter character gaps into character and word cluster spaces, which are used to compute thresholds for extracting words. The method also takes care of variations in character and word gaps. The proposed methodology is evaluated on a data set of 1008 low resolution images of display boards containing Kannada text captured from 2 mega pixel cameras on mobile phones at various sizes 240x320, 600x800 and 900x1200. The method achieves text line segmentation accuracy of 97.17%, word segmentation accuracy of 97.54% and character extraction accuracy of 99.09%. The proposed method is tolerant to font variability, spacing variations between characters and words, absence of free segmentation path due to consonant and vowel modifiers, noise and other degradations. The experimentation with images containing overlapped text lines has given promising results.

机译：可靠地提取/分割文本行，单词和字符是开发用于理解低分辨率显示板图像中的文本的自动化系统的非常重要的步骤之一。本文提出了一种在低分辨率显示板图像中分割卡纳达语文本的行，单词和字符的新方法。所提出的方法使用投影轮廓特征和关于像素分布统计的文本行分割。该方法还检测包含辅音修饰符的文本行，并将其与相应的文本行合并，并有效地分离重叠的文本行。字符提取过程使用垂直轮廓特征计算字符边界，以从每个文本行中提取字符图像。此外，分词过程使用k均值聚类将字符间的字符间隙分为字符和词簇空间，这些空间用于计算提取词的阈值。该方法还考虑了字符和单词间隙的变化。拟议的方法是在显示板的1008张低分辨率图像数据集上进行评估的，该显示板包含从240x320、600x800和900x1200各种尺寸的手机上的2百万像素相机捕获的卡纳达语文本。该方法实现了文本线分割精度为97.17％，单词分割精度为97.54％，字符提取精度为99.09％。所提出的方法可以忍受字体的可变性，字符和单词之间的间距变化，由于辅音和元音修饰符而没有自由的分割路径，噪声和其他劣化。使用包含重叠文本行的图像进行的实验已获得了可喜的结果。

著录项

来源
《International Conference on Signal and Image Processing》|2014年|42-49|共8页
会议地点
作者
Angadi S.A.; Kodabagi M.M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Display Boards; K-Means Clustering; Low Resolution Images; Projection Profile Features; Segmentation;

机译：显示板; K-均值聚类;低分辨率图像;投影轮廓特征;分段;

相似文献

外文文献
中文文献
专利

1. A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images [J] . S. A. Angadi, M. M. Kodabagi International Journal of Image and Graphics . 2014,第1a2期

机译：用于从低分辨率显示板图像中的卡纳达语文本中提取行，单词和字符的鲁棒分割技术
2. Word Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images [J] . Ram Sarkar, Samir Malakar, Nibaran Das, Journal of Intelligent Systems . 2011,第3期

机译：从不受约束的手写孟加拉语文档图像的文本行中提取单词并进行字符分割
3. Text Character Extraction Implementation from Captured Handwritten Image to Text Conversionusing Template Matching Technique [J] . Seema Barate1, Chaitrali Kamthe1, Shweta Phadtare1, MATEC Web of Conferences . 2016,第2016期

机译：使用模板匹配技术从捕获的手写图像中提取文本字符以实现文本转换
4. A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images [C] . Angadi S.A., Kodabagi M.M. International Conference on Signal and Image Processing . 2014

机译：来自低分辨率显示板图像中的Kannada文本的线，Word和字符提取的强大分割技术
5. Feature extraction in digitized images through image segmentation techniques. [D] . Prasadarao, Mokkarala V. 1983

机译：通过图像分割技术提取数字化图像中的特征。
6. Text Extraction from Scene Images by Character Appearance and Structure Modeling [O] . Chucai Yi, Yingli Tian -1

机译：通过字符外观和结构建模从场景图像提取文本
7. A Study of different Text Line Extraction Techniques for Multi-font and Multi-size Printed Kannada Documents [O] . R Prajna, Ramya V R, Mamatha H.R 2015

机译：多字体和多尺寸印刷kannada文档的不同文本线提取技术的研究

A Robust Segmentation Technique for Line, Word and Character Extraction from Kannada Text in Low Resolution Display Board Images

摘要

著录项

相似文献

相关主题

期刊订阅