首页> 外文期刊>Systems Science >MULTISTAGE SEMI-AUTOMATIC TEXT IMAGE SEGMENTATION FOR TRAINING SET ACQUISITION IN HANDWRITING RECOGNITION
【24h】

MULTISTAGE SEMI-AUTOMATIC TEXT IMAGE SEGMENTATION FOR TRAINING SET ACQUISITION IN HANDWRITING RECOGNITION

机译:手写识别中训练集获取的多阶段半自动文本图像分割

获取原文
获取原文并翻译 | 示例

摘要

In the paper, a complete method of text image segmentation into the images of individual characters is proposed. The ultimate aim of the segmentation process is to prepare a set of correctly labeled character samples that can be used to train the character classifier applied as the component of the handwritten word recognizer. The method proposed consists of two stages. At the first stage, the text image is first divided into lines and then the lines are segmented into words. In this phase, the known spelling representation of the text on the image is used, so as to obtain as many segments as the number of words in the text. The information about the expected width of known words is also utilized. At the second stage, the obtained images of known words are segmented into individual characters. The multiphase procedure is applied. It first segments individual words independently, using the estimates of character widths obtained by the complete text corpus analysis. Then the global text segmentation is elaborated, which maximizes the similarity measures of samples extracted for all alphabet characters. Genetic algorithm is applied in this phase. Finally, the segmentation variants represented by chromosomes in the terminal population of the genetic algorithm are locally refined and the most dissimilar samples in sets corresponding to the alphabet characters are rejected. The experiments conducted showed that the accuracy of handwriting recognition achieved by recognizers trained with the training set obtained with the proposed method is close to the accuracy achievable with the training set prepared by a human expert.
机译:本文提出了一种将文本图像分割成单个字符图像的完整方法。分割过程的最终目的是准备一组正确标记的字符样本,这些样本可用于训练用作手写单词识别器组件的字符分类器。所提出的方法包括两个阶段。在第一阶段,首先将文本图像划分为几行,然后将这些行划分为单词。在此阶段,使用图像上文本的已知拼写表示形式,以便获得与文本中单词数量一样多的片段。还利用了有关已知单词的预期宽度的信息。在第二阶段,将获得的已知单词的图像分割成单个字符。应用多阶段过程。它首先使用通过完整文本语料库分析获得的字符宽度估计值独立地对单个单词进行分段。然后详细说明了全局文本分割,该分割使针对所有字母字符提取的样本的相似性度量最大化。在这一阶段应用了遗传算法。最后,对遗传算法末端种群中的染色体代表的分段变体进行局部优化,并拒绝与字母字符相对应的集合中最相似的样本。进行的实验表明,由使用本方法获得的训练集训练的识别器所实现的手写识别的准确性接近于由人类专家准备的训练集所实现的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号