首页> 外文学位 >Font classification and character segmentation for postal address reading.
【24h】

Font classification and character segmentation for postal address reading.

机译:用于邮政地址阅读的字体分类和字符分段。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation introduces both a new font classification approach and a new character segmentation algorithm in order to improve the performance of return address recognition in US mail pieces.; The proposed font classification identifies font style, font group, and font name with a word input in a return address. The proposed a priori and local approach to the font classification allows an OCR system consisting of various font-specific character segmentation tools and various mono-font character recognizers; The proposed font classification uses ascenders, descenders, and serifs extracted from a word image. The gradient features of those sub-images are extracted and used as an input to a neural network classifier to produce font classification results. The font classifier presented in this research can identify a font even with one word that has severely touching characters.; The proposed character segmentation is a font-specific approach that uses side profiles according to font groups. The merged parts of touching characters generate different shapes of patterns from the primitive character patterns. However, the leftmost side and the rightmost side of touching characters will not be affected by the touching.; The analysis of those side profiles gives the candidate single characters for touching characters, since a side profile of each character is unique. The cutting cost and the tangent cost are defined to find an optimal segmenting path.; The results have shown that the font classification accuracy reaches about 95.4% performance level even with severely touching characters in 7 PostScript fonts such as Avant Garde, Bookman, Courier, Helvetica, New Century Schoolbook, Palatino, and Times.; The performance of the character segmentation has been obtained using a real envelope reader system, which can recognize return addresses in US mail pieces and sort the mail pieces according to the senders. 3359 mail pieces were tested. The improvement was from 68.92% to 80.08% by the proposed character segmentation.
机译:为了提高美国邮件回信识别的性能,本文引入了一种新的字体分类方法和一种新的字符分割算法。提议的字体分类通过在返回地址中输入单词来标识字体样式,字体组和字体名称。提议的字体的先验和局部方法允许OCR系统由各种特定于字体的字符分割工具和各种单字体字符识别器组成;建议的字体分类使用从单词图像中提取的升序,降序和衬线。提取那些子图像的梯度特征,并将其用作神经网络分类器的输入,以产生字体分类结果。这项研究中提出的字体分类器甚至可以识别一个带有严重触摸字符的单词的字体。提议的字符分割是一种特定于字体的方法,该方法根据字体组使用侧面轮廓。触摸字符的合并部分从原始字符样式生成了不同形状的样式。但是,触摸字符的最左侧和最右侧将不受触摸的影响。这些侧面轮廓的分析为候选字符提供了用于触摸字符的单个字符,因为每个字符的侧面轮廓都是唯一的。定义切割成本和切线成本以找到最佳分割路径。结果表明,即使使用Avant Garde,Bookman,Courier,Helvetica,New Century Schoolbook,Palatino和Times等7种PostScript字体,即使严重触摸字符,字体分类精度也可以达到约95.4%的性能水平。使用真实的信封阅读器系统已经获得了字符分割的性能,该系统可以识别美国邮件中的回信地址并根据发件人对邮件进行分类。测试了3359个邮件。通过提议的字符分割,从68.92%改善到80.08%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号