Extraction of valid data sets in registers using recognition of invalidation lines

机译：使用Inventation行的识别提取寄存器中的有效数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes an approach for the extraction of the valid data sets in legal registers containing data which have been invalidated by invalidation lines. Invalidation lines are hand drawn lines below the invalid words or text lines. In a first step detection of horizontal lines and segmentation of the text objects (block, line, word, character) is performed based on a fast connected component analysis using sub-components, which is robust against touching lines. Invalidation is performed on a word or text line level using the neighborhood relation between text objects and invalidation lines. Invalidations are recognized with about 90% accuracy at about 10% rejection threshold (false negatives). The error rate (i.e. invalidation of a valid word) is less than 0.5%. For most data sets it is sufficient to eliminate the invalidated text, so the valid data remains. In a second step a syntactical analysis on the valid text strings is performed. This increases the accuracy to 99% on the data set level. Error detection and correction is done by a graphical user interface. Data capture time can be reduced by a factor of 2 to 3 compared with manual input.

机译：本文介绍了一种方法，用于提取包含通过无效行无效的数据的合法寄存器中的有效数据集。无效行是无效单词或文本行下方的手绘线。在使用子分量的基于快速连接的分量分析的基于快速连接的分量分析的第一步检测文本对象的水平线和分割的第一步检测，这对于触摸线是坚固的快速连接的分量来执行。使用文本对象与无效行之间的邻域关系，在单词或文本行级执行无效。在抑制阈值约为10％的精度约为90％的准确度（假阴性）的识别失效。错误率（即有效字的无效）小于0.5％。对于大多数数据集，消除无效文本就足够了，因此存在有效数据。在第二步中，执行有效文本字符串的语法分析。这会使数据集级别提高99％的准确性。错误检测和校正由图形用户界面完成。与手动输入相比，数据捕获时间可以减少2到3倍。

著录项

来源
《Society of Photo-Optical Instrumentation Engineers Conference on Document Recognition and Retrieval》|2003年||共6页
会议地点
作者
Gerd Maderlechner; Peter Suda;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 N-532;
关键词
document image analysis; layout analysis; segmentation; syntactical analysis; invalidation lines;

机译：文档图像分析;布局分析;分割;语法分析;无效行;

相似文献

外文文献
中文文献
专利

1. Use of the recognition heuristic depends on the domain's recognition validity, not on the recognition validity of selected sets of objects [J] . Pohl Rudiger F., Michalkiewicz Martha, Erdfelder Edgar, Memory & cognition . 2017,第5期

机译：使用识别启发式依赖于域的识别有效性，而不是在所选对象集的识别有效性上
2. Legislative genesis and judicial death of a directive: The European Court of Justice invalidated the data retention directive (2006/24/EC) thereby creating a sustained period of legal uncertainty about the validity of national laws which enacted it [J] . Xavier Tracol Computer law & security report . 2014,第6期

机译：指令的立法成因和司法死亡：欧洲法院使数据保留指令（2006/24 / EC）无效，从而对实施该指令的国家法律的有效性造成了持续的法律不确定性
3. Validity of a population-based cancer register in sweden an assessment of data reproducibility in the South-East region prostate cancer register. [J] . Sandblom G, Dufmats M, Olsson M, Scandinavian journal of urology and nephrology . 2003,第2期

机译：瑞典以人群为基础的癌症登记册的有效性评估了东南地区前列腺癌登记册中数据的可重复性。
4. Extraction of valid data sets in registers using recognition of invalidation lines [C] . Gerd Maderlechner, Peter Suda Society of Photo-Optical Instrumentation Engineers Conference on Document Recognition and Retrieval . 2003

机译：使用Inventation行的识别提取寄存器中的有效数据集
5. Analytical methods for the extraction of content from high resolution 3D data sets: Epigraphical applications to the Drakon Stele. [D] . Sullivan, Stephanie Marie. 2011

机译：从高分辨率3D数据集中提取内容的分析方法：在Drakon Stele的碑文应用。
6. Cohort profile: the Scottish Research register SHARE. A register of peopleinterested in research participation linked to NHS data sets [O] . Brian McKinstry, Frank M Sullivan, Shobna Vasishta, 2017

机译：队列资料：苏格兰研究机构注册SHARE。人名册对与NHS数据集相关的研究参与感兴趣
7. Use of the recognition heuristic depends on the domain’s recognition validity, not on the recognition validity of selected sets of objects [O] . Pohl, Rüdiger F., Michalkiewicz, Martha, Erdfelder, Edgar, 2017

机译：识别启发式的使用取决于域的识别有效性，而不取决于所选对象集的识别有效性

Extraction of valid data sets in registers using recognition of invalidation lines

摘要

著录项

相似文献

相关主题

期刊订阅