首页> 外文会议>Document Recognition III >Character recognition of Japanese newspaper headlines with graphical designs
【24h】

Character recognition of Japanese newspaper headlines with graphical designs

机译:图形设计对日本报纸头条的字符识别

获取原文

摘要

Abstract: Graphical designs are often used in Japanese newspaper headlines to indicate hot articles. However, conventional OCR software seldom recognizes characters in such headlines because of the difficulty of removing the designs. This paper proposes a method that recognizes these characters without needing removal of the graphical designs. First, the number of text-line regions and the averaged character heights are roughly extracted from the local distribution of the black and white runs observed in a rectangular window while the window is shifted pixel- by-pixel along the direction of the text-line. Next, normalized text-line regions are yielded by normalizing their heights to the height of binary reference patterns in a dictionary. Next, displacement matching is applied to the normalized text-line region for character recognition. A square window at each position is matched against binary reference patterns while being shifted pixel-by-pixel along the direction of the text-line. The complementary similarity measure, which is robust against graphical designs, is used as a discriminant function. When the maximum similarity value at each position exceeds the threshold, which is automatically determined from the degree of degradation in the square window, the character category of this similarity value is specified as a recognized category. Experimental results for fifty Japanese newspaper headlines show that the method achieves recognition rates of over 90%, much higher than a conventional method (17%). !11
机译:摘要:在日本报纸的头条新闻中经常使用图形设计来表示热门文章。然而,由于难以去除设计,常规的OCR软件很少识别这些标题中的字符。本文提出了一种无需删除图形设计即可识别这些字符的方法。首先,从矩形矩形窗口中观察到的黑白行的局部分布中,粗略地提取出文本行区域的数量和平均字符高度,同时该窗口沿文本行方向逐像素移动。接下来,通过将其高度标准化为字典中二进制参考图案的高度来产生标准化的文本行区域。接下来,将位移匹配应用于归一化的文本行区域以进行字符识别。每个位置的方形窗口都与二进制参考图案匹配,同时沿文本行方向逐像素移动。对图形设计具有鲁棒性的互补相似性度量用作判别函数。当每个位置的最大相似度值超过阈值时,该阈值是根据方形窗口中的劣化程度自动确定的,则将该相似度值的字符类别指定为可识别类别。对五十个日本报纸头条的实验结果表明,该方法的识别率超过90%,远高于传统方法的识别率(17%)。 !11

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号