首页> 外文会议>Document Recognition III >Intelligent form removal with character stroke preservation
【24h】

Intelligent form removal with character stroke preservation

机译:智能表格删除和字符笔划保存

获取原文

摘要

Abstract: A new technique for intelligent form removal has been developed along with a new method for evaluating its impact on optical character recognition (OCR). All the dominant lines in the image are automatically detected using the Hough line transform and intelligently erased while simultaneously preserving overlapping character strokes by computing line width statistics and keying off of certain visual cues. This new method of form removal operates on loosely defined zones with no image deskewing. Any field in which the writer is provided a horizontal line to enter a response can be processed by this method. Several examples of processed fields are provided, including a comparison of results between the new method and a commercially available forms removal package. Even if this new form removal method did not improve character recognition accuracy, it is still a significant improvement to the technology because the requirement of a priori knowledge of the form's geometric details has been greatly reduced. This relaxes the recognition system's dependence on rigid form design, printing, and reproduction by automatically detecting and removing some of the physical structures (lines) on the form. Using the National Institute of Standards and Technology (NIST) public domain form-based handprint recognition system, the technique was tested on a large number of fields containing randomly ordered handprinted lowercase alphabets, as these letters (especially those with descenders) frequently touch and extend through the line along which they are written. Preserving character strokes improves overall lowercase recognition performance by 3%, which is a net improvement, but a single performance number like this doesn't communicate how the recognition process was really influenced. There is expected to be trade- offs with the introduction of any new technique into a complex recognition system. To understand both the improvements and the trade-offs, a new analysis was designed to compare the statistical distributions of individual confusion pairs between two systems. As OCR technology continues to improve, sophisticated analyses like this are necessary to reduce the errors remaining in complex recognition problems. !14
机译:摘要:已经开发了一种新的智能表格删除技术,以及一种评估其对光学字符识别(OCR)的影响的新方法。使用霍夫线变换自动检测图像中的所有优势线,并进行智能擦除,同时通过计算线宽统计信息并消除某些视觉提示来保留重叠的字符笔划。这种新的表单删除方法可在松散定义的区域上运行,而不会产生图像偏移。可以使用此方法处理在其中向书写器提供水平线以输入响应的任何字段。提供了几个已处理字段的示例,包括新方法与市售的表单删除程序包之间的结果比较。即使这种新的表格删除方法不能提高字符识别的准确性,它仍然是对这项技术的重大改进,因为大大减少了对表格几何细节的先验知识的要求。通过自动检测和删除表单上的某些物理结构(线条),可以放松识别系统对刚性表单设计,打印和复制的依赖。使用美国国家标准技术研究院(NIST)公共领域基于表单的手印识别系统,在包含随机排序的手印小写字母的大量字段上对该技术进行了测试,因为这些字母(尤其是带有降序的字母)经常接触和扩展通过他们被写的那条线。保留字符笔划可使整体小写字母识别性能提高3%,这是净改进,但是像这样的单个性能数字并不能说明识别过程是如何受到真正影响的。在复杂的识别系统中引入任何新技术都需要权衡取舍。为了理解改进和折衷,设计了一种新的分析来比较两个系统之间单个混乱对的统计分布。随着OCR技术的不断改进,必须进行此类复杂的分析以减少复杂的识别问题中残留的错误。 !14

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号