首页> 外国专利> Word spotting in bitmap images using word bounding boxes and hidden Markov models

Word spotting in bitmap images using word bounding boxes and hidden Markov models

机译:使用单词边界框和隐藏的Markov模型在位图图像中发现单词

摘要

Font-independent spotting of user-defined keywords in a scanned image. Word identification is based on features of the entire word without the need for segmentation or OCR, and without the need to recognize non- keywords. Font-independent character models are created using hidden Markov models (HMMs) and arbitrary keyword models are built from the character HMM components. Word or text line bounding boxes are extracted from the image, a set of features based on the word shape, (and preferably also the word internal structure) within each bounding box is extracted, this set of features is applied to a network that includes one or more keyword HMMs, and a determination is made. The identification of word bounding boxes for potential keywords includes the steps of reducing the image (say by 2×) and subjecting the reduced image to vertical and horizontal morphological closing operations. The bounding boxes of connected components in the resulting image are then used to hypothesize word or text line bounding boxes, and the original bitmaps within the boxes are used to hypothesize words. In a particular embodiment, a range of structuring elements is used for the closing operations to accommodate the variation of inter- and intra-character spacing with font and font size.
机译:扫描图像中用户定义的关键字的字体无关点。单词识别基于整个单词的特征,无需分段或OCR,也无需识别非关键字。使用隐藏的马尔可夫模型(HMM)创建与字体无关的字符模型,并从字符HMM组件构建任意关键字模型。从图像中提取单词或文本行边界框,提取每个边界框内基于单词形状的一组特征(最好还有单词内部结构),此组特征应用于包含一个特征的网络或更多关键字HMM,然后确定。潜在关键词的单词边界框的识别包括以下步骤:缩小图像(例如,缩小2倍),并对缩小后的图像进行垂直和水平形态封闭操作。然后,将所得图像中已连接组件的边界框用于假设单词或文本行边界框,并使用该框内的原始位图来假设单词。在特定实施例中,一系列结构元素用于关闭操作,以适应字符间和字符间间距随字体和字体大小的变化。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号