首页> 外文会议>Document Analysis Systems, DAS, 2008 Eighth IAPR Workshop on >Word Extraction Method by Generating Multiple Character Hypotheses
【24h】

Word Extraction Method by Generating Multiple Character Hypotheses

机译:生成多个字符假设的单词提取方法

获取原文

摘要

It is necessary to extract precisely words of headers and data for recognizing logical structure of form images. However, word extraction often fails because of layout analysis or character recognition error, which leads correct character hypotheses not to be generated. We propose a word extraction method which generates multiple character hypotheses and extracts their combinations which correspond with the character orders of words. Firstly character hypotheses which overlap with each other are generated by combinatorial recognition of connected components and their combinations which correspond with words are extracted by clique extraction from a graph. And then, character hypotheses are generated by recognition with limited target and their combinations which correspond with words areextracted by matching between lattices based on local optimum, in which variety of recognition results and regular expression of words are considered. We confirmed the effect of our method by the experiment for form images.
机译:为了识别表格图像的逻辑结构,有必要精确地提取标题和数据的单词。但是,由于布局分析或字符识别错误,单词提取通常会失败,从而导致无法生成正确的字符假设。我们提出了一种单词提取方法,该方法生成多个字符假设并提取与单词的字符顺序相对应的组合。首先,通过对所连接的组件进行组合识别来生成彼此重叠的字符假设,并且通过从图上进行集团提取来提取与单词相对应的它们的组合。然后,通过对目标进行有限的识别来生成字符假设,并根据局部最优值通过格间匹配来提取与单词相对应的组合,其中考虑了各种识别结果和单词的正则表达。我们通过表格图像实验确认了我们方法的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号