首页> 外文会议> >Modelling polyfont printed characters with HMMs and a shift invariant Hamming distance
【24h】

Modelling polyfont printed characters with HMMs and a shift invariant Hamming distance

机译:使用HMM和平移不变的汉明距离对Polyfont印刷字符进行建模

获取原文

摘要

Rumours of the death of the problem of machine-printed text recognition have been greatly exaggerated. Reported results can be good enough to lead one to believe that this is a "solved problem". Closer analysis reveals test data that is often limited in its range of fonts and point sizes. Worse still, results are commonly quoted for noise-free images, ignoring the problems of recognising "real" documents such as faxes. Various methods have been proposed for modelling characters with Hidden Markov Models. The authors, amongst others, have suggested representing a character by analysing the pixel pattern in columns of its image, and linking sequential column patterns together with a HMM. In this paper we propose a method of quantising the patterns by means of a Shift Invariant Hamming Distance. A full experimental evaluation (45 fonts, 5 point sizes) in typical noise results in a recognition accuracy of 99% in the top-3 choices, and 94% top-choice for the best font. The method has a significant advantage in recognising noisy word images, due to classification being achieved without a prior segmentation of the word into characters.
机译:关于机印文本识别问题死亡的谣言被大大夸大了。报告的结果可能足以使人们相信这是一个“已解决的问题”。进一步的分析揭示了测试数据,这些数据通常在其字体和磅值范围内受到限制。更糟糕的是,通常会引用无噪声图像的结果,而忽略了识别“真实”文档(例如传真)的问题。已经提出了各种方法来用隐马尔可夫模型对字符进行建模。除其他外,作者建议通过分析图像列中的像素图案,并将顺序的列图案与HMM链接在一起来表示字符。在本文中,我们提出了一种通过移位不变汉明距离对模式进行量化的方法。对典型噪声进行全面的实验评估(45种字体,5点大小)后,前三项选择的识别精度为99%,最佳字体的选择精度为94%。该方法在识别嘈杂的单词图像方面具有显着的优势,这是因为无需事先将单词分割为字符即可实现分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号