首页> 外国专利> METHOD, SYSTEM, AND COMPUTER-READABLE RECORDING MEDIUM FOR RECOGNIZING CHARACTERS INCLUDED IN A DOCUMENT BY USING LANGUAGE MODEL AND OCR

METHOD, SYSTEM, AND COMPUTER-READABLE RECORDING MEDIUM FOR RECOGNIZING CHARACTERS INCLUDED IN A DOCUMENT BY USING LANGUAGE MODEL AND OCR

机译:通过使用语言模型和OCR识别文档中包含的字符的方法,系统和计算机可读记录介质

摘要

PURPOSE: A method, a system, and a computer-readable recording medium for recognizing characters included in a document by using language model and an OCR are provided to judges an image/noise region mis-classified into a text region by referring to the location information of character inputted to an OCR device. CONSTITUTION: A first OCR(Optical Character Recognition) unit(130) recognizes a text string included in a text section by using a first OCR, and a second OCR(140) recognizes the text string including an mage/noise section. A documents structure analysis unit(150) analyzes the document structure to find out the text string including a certain region mis-classified through a language model. Based on the location information for the region obtained from the first OCR, the region is re-classified into an image/noise section.
机译:目的:提供一种用于通过使用语言模型和OCR来识别文档中包括的字符的方法,系统和计算机可读记录介质,以通过参考位置来判断误分类为文本区域的图像/噪声区域输入到OCR设备的字符信息。构成:第一OCR(光学字符识别)单元(130)通过使用第一OCR识别包括在文本部分中的文本串,第二OCR(140)识别包括法师/噪声部分的文本串。文档结构分析单元(150)分析文档结构以找出包括通过语言模型被误分类的特定区域的文本串。基于从第一OCR获得的区域的位置信息,将该区域重新分类为图像/噪声部分。

著录项

  • 公开/公告号KR101028670B1

    专利类型

  • 公开/公告日2011-04-12

    原文格式PDF

  • 申请/专利权人

    申请/专利号KR20080103890

  • 申请日2008-10-22

  • 分类号G06F17/26;G06F17/27;G06F17/21;

  • 国家 KR

  • 入库时间 2022-08-21 17:50:23

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号