首页> 外文会议>National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics >Script independent detection of bold words in multi font-size documents
【24h】

Script independent detection of bold words in multi font-size documents

机译:与脚本无关的多字体大小文档中的粗体字检测

获取原文

摘要

A script independent, font-size independent scheme is proposed for detecting bold words in printed pages. In OCR applications such as minor modifications of an existing printed form, it is desirable to reproduce the font size and characteristics such as bold, and italics in the OCR recognized document. In this morphological opening based detection of bold (MOBDoB) method, the binarized image is segmented into sub-images with uniform font sizes, using the word height information. Rough estimation of the stroke widths of characters in each sub-image is obtained from the density. Each sub-image is then opened with a square structuring element of size determined by the respective stroke width. The union of all the opened sub-images is used to determine the locations of the bold words. Extracting all such words from the binarized image gives the final image. A minimum of 98 % of bold words were detected from a total of 65 Tamil, Kannada and English pages and the false alarm rate is less than 0.4 %.
机译:提出了一种与脚本无关,与字体大小无关的方案,用于检测打印页面中的粗体字。在OCR应用中,例如对现有印刷形式的较小修改,期望在OCR识别的文档中再现字体大小和特征,例如粗体和斜体。在基于形态学的粗体检测(MOBDoB)方法中,使用单词高度信息将二值化后的图像分割为具有统一字体大小的子图像。从密度获得每个子图像中字符的笔划宽度的粗略估计。然后,每个子图像都用一个正方形结构元素打开,该正方形结构元素的大小由相应的笔划宽度确定。所有打开的子图像的并集用于确定粗体字的位置。从二值化图像中提取所有这样的单词,得到最终图像。在总共65个泰米尔语,卡纳达语和英语页面中,至少检测到98%的粗体字,且误报警率小于0.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号