首页> 外文会议>Society of Photo-Optical Instrumentation Engineers Conference on Document Recognition and Retrieval >Extraction of valid data sets in registers using recognition of invalidation lines
【24h】

Extraction of valid data sets in registers using recognition of invalidation lines

机译:使用Inventation行的识别提取寄存器中的有效数据集

获取原文

摘要

This paper describes an approach for the extraction of the valid data sets in legal registers containing data which have been invalidated by invalidation lines. Invalidation lines are hand drawn lines below the invalid words or text lines. In a first step detection of horizontal lines and segmentation of the text objects (block, line, word, character) is performed based on a fast connected component analysis using sub-components, which is robust against touching lines. Invalidation is performed on a word or text line level using the neighborhood relation between text objects and invalidation lines. Invalidations are recognized with about 90% accuracy at about 10% rejection threshold (false negatives). The error rate (i.e. invalidation of a valid word) is less than 0.5%. For most data sets it is sufficient to eliminate the invalidated text, so the valid data remains. In a second step a syntactical analysis on the valid text strings is performed. This increases the accuracy to 99% on the data set level. Error detection and correction is done by a graphical user interface. Data capture time can be reduced by a factor of 2 to 3 compared with manual input.
机译:本文介绍了一种方法,用于提取包含通过无效行无效的数据的合法寄存器中的有效数据集。无效行是无效单词或文本行下方的手绘线。在使用子分量的基于快速连接的分量分析的基于快速连接的分量分析的第一步检测文本对象的水平线和分割的第一步检测,这对于触摸线是坚固的快速连接的分量来执行。使用文本对象与无效行之间的邻域关系,在单词或文本行级执行无效。在抑制阈值约为10%的精度约为90%的准确度(假阴性)的识别失效。错误率(即有效字的无效)小于0.5%。对于大多数数据集,消除无效文本就足够了,因此存在有效数据。在第二步中,执行有效文本字符串的语法分析。这会使数据集级别提高99%的准确性。错误检测和校正由图形用户界面完成。与手动输入相比,数据捕获时间可以减少2到3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号