首页> 外文会议>Document Recognition II >Public domain optical character recognition
【24h】

Public domain optical character recognition

机译:公共领域光学字符识别

获取原文

摘要

Abstract: A public domain document processing system has been developed by the National Institute of Standards and Technology (NIST). The system is a standard reference form-based handprint recognition system for evaluating optical character recognition (OCR), and it is intended to provide a baseline of performance on an open application. The system's source code, training data, performance assessment tools, and type of forms processed are all publicly available. The system recognizes the handprint entered on handwriting sample forms like the ones distributed with NIST Special Database 1. From these forms, the system reads hand-printed numeric fields, upper and lowercase alphabetic fields, and unconstrained text paragraphs comprised of words from a limited-size dictionary. The modular design of the system makes it useful for component evaluation and comparison, training and testing set validation, and multiple system voting schemes. The system contains a number of significant contributions to OCR technology, including an optimized probabilistic neural network (PNN) classifier that operates a factor of 20 times faster than traditional software implementations of the algorithm. The source code for the recognition system is written in C and is organized into 11 libraries. In all, there are approximately 19,000 lines of code supporting more than 550 subroutines. Source code is provided for form registration, form removal, field isolation, field segmentation, character normalization, feature extraction, character classification, and dictionary-based postprocessing. The recognition system has been successfully compiled and tested on a host of UNIX workstations. This paper gives an overview of the recognition system's software architecture, including descriptions of the various system components along with timing and accuracy statistics. !27
机译:摘要:美国国家标准技术研究院(NIST)已开发了一种公共领域的文档处理系统。该系统是用于评估光学字符识别(OCR)的基于标准参考表格的手印识别系统,旨在为开放应用程序提供性能基准。系统的源代码,培训数据,绩效评估工具以及处理的表格类型都是公开可用的。系统会识别在手写示例表格(例如与NIST特殊数据库1一起分发的表格)上输入的手印。系统会从这些表格中读取手写的数字字段,大写和小写字母字段以及不受限制的文字组成的文字段落,大小字典。系统的模块化设计使其可用于组件评估和比较,培训和测试集验证以及多种系统投票方案。该系统对OCR技术做出了许多重大贡献,包括优化的概率神经网络(PNN)分类器,该分类器的运算速度比该算法的传统软件实现快20倍。识别系统的源代码用C编写,并组织为11个库。总共大约有19,000行代码支持550多个子例程。提供了用于表单注册,表单删除,字段隔离,字段分段,字符归一化,特征提取,字符分类和基于字典的后处理的源代码。识别系统已经在许多UNIX工作站上成功编译和测试。本文概述了识别系统的软件体系结构,包括对各种系统组件以及时序和准确性统计信息的描述。 !27

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号