首页> 外文会议> >A system for reading low quality characters from printouts
【24h】

A system for reading low quality characters from printouts

机译:从打印输出中读取低质量字符的系统

获取原文

摘要

In this paper a system is presented which is used to read low quality machine-printed characters. It is used to read computer printouts when the data file is not available. The assumptions on the characters are that their font belongs to a set of known fonts and that they are organized into tables or columns. Usually, the printer used for these documents is fast and the printing quality is low, due to the used up inked ribbon and to damaged nozzles or print head. Hence standard machine-printed OCR systems feature about 15% error rate on these sheets, a specific technique is needed. In order to cope with the recognition of broken characters and character pieces, the system is based on a two step strategy. First, it tries to match the unknown character using a moving-window technique. Then, if this fails, it creates a new reference image set using the already recognized characters of the document and repeats the first matching step. Thus, the correlation among damaged characters is used. The first step allows to reach a 2% error rate and the application of the second step lowers it to 0.15%. This low error rate is possible thanks to the ability of the system to adapt its behavior to the damaged characters produced by the printer. The average recognition time on a SUN SparcStation 10 is 15 ms/character, computed on about 100000 characters contained in 50 documents.
机译:在本文中,提出了一种用于读取低质量机器打印字符的系统。当数据文件不可用时,它用于读取计算机的打印输出。对字符的假设是它们的字体属于一组已知的字体,并且它们被组织成表格或列。通常,由于墨用完的色带以及损坏的喷嘴或打印头,用于这些文档的打印机速度快,打印质量低。因此,标准的机印OCR系统在这些纸张上的错误率约为15%,因此需要一种特定的技术。为了应对残破的字符和字符碎片的识别,该系统基于两步策略。首先,它尝试使用移动窗口技术来匹配未知字符。然后,如果失败,它将使用文档中已识别的字符创建一个新的参考图像集,并重复第一个匹配步骤。因此,使用了受损字符之间的相关性。第一步允许达到2%的错误率,第二步的应用将其降低到0.15%。由于系统能够使其行为适应打印机产生的损坏字符,因此这种低错误率是可能的。 SUN SparcStation 10上的平均识别时间为15毫秒/字符,以50个文档中包含的约100000个字符计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号