首页> 外文会议>5th International Workshop on Document Analysis Systems V DAS 2002, Aug 19-21, 2002, Princeton, NJ, USA >Document-Form Identification Using Constellation Matching of Keywords Abstracted by Character Recognition
【24h】

Document-Form Identification Using Constellation Matching of Keywords Abstracted by Character Recognition

机译:通过星座匹配对字符识别提取的关键词进行星座识别

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

A document-form identification method based on constellation matching of targets is proposed. Mathematical analysis shows that the method achieves a high identification rate by preparing plural targets. The method consists of two parts: (ⅰ) extraction of targets such as important keywords in a document by template matching between recogised characters and word strings in a keyword dictionary, and (ⅱ) analysis of the positional or semantic relationship between the targets by point-pattern matching between these targets and word location information in the keyword dictionary. All characters in the document are recognised by means of a conventional character-recognition method. An automatic keyword-determination method, which is necessary for making a keyword dictionary beforehand, is also proposed. This method selects the most suitable keywords from a general word dictionary by measuring the uniqueness of keywords and the stability of their recognition. Experiments using 671 sample documents with 107 different forms in total confirmed that (ⅰ) the keyword-determination method can determine sets of keywords automatically in 92.5% of 107 different forms and (ⅱ) that the form-identification method can correctly identify 97.1% of 671 document samples at a rejection rate 2.9%.
机译:提出了一种基于目标星座匹配的文档形式识别方法。数学分析表明,该方法通过准备多个目标实现了较高的识别率。该方法包括两个部分:(ⅰ)通过关键词字典中的已识别字符和词串之间的模板匹配来提取目标(例如文档中的重要关键字),以及(ⅱ)按点分析目标之间的位置或语义关系这些目标与关键字字典中的单词位置信息之间的匹配模式。文档中的所有字符都可以通过常规的字符识别方法进行识别。还提出了一种自动关键字确定方法,这是预先制作关键字词典所必需的。该方法通过测量关键字的唯一性及其识别的稳定性,从通用单词词典中选择最合适的关键字。使用671个样本文档(共107种不同形式)进行的实验证实,(ⅰ)关键字确定方法可以自动确定107种不同形式中的92.5%的关键字集,并且(ⅱ)表单识别方法可以正确地识别97.1%的关键字671个文档样本的拒绝率为2.9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号