首页> 外国专利> DOCUMENT IMAGE PROCESSING APPARATUS, DOCUMENT IMAGE PROCESSING METHOD, DOCUMENT IMAGE PROCESSING PROGRAM, AND RECORDING MEDIUM ON WHICH DOCUMENT IMAGE PROCESSING PROGRAM IS RECORDED

DOCUMENT IMAGE PROCESSING APPARATUS, DOCUMENT IMAGE PROCESSING METHOD, DOCUMENT IMAGE PROCESSING PROGRAM, AND RECORDING MEDIUM ON WHICH DOCUMENT IMAGE PROCESSING PROGRAM IS RECORDED

机译:记录文件图像处理装置,文件图像处理方法,文件图像处理程序以及记录了文件图像处理程序的媒体

摘要

An image of a character string composed of M pieces of characters is clipped from a document image, and the image is divided into separate characters. Image features of each character image are extracted. Based on the image features, N (N1, integer) pieces of character images in descending order of degree of similarity are selected as candidate characters, from a character image feature dictionary which stores the image features of character image in units of character, and a first index matrix of M×N cells is prepared. A candidate character string composed of a plurality of candidate characters constituting a first column of the first index matrix, is subjected to a lexical analysis according to a language model, and whereby a second index matrix having a character string which makes sense is prepared. In the language model, statistics are taken and then, the lexical analysis is performed.
机译:从文档图像中剪切出由M个字符组成的字符串图像,并将该图像划分为单独的字符。提取每个字符图像的图像特征。根据图像特征,从以字符为单位存储字符图像的图像特征的字符图像特征字典中,选择相似度从高到低的N(N> 1,整数)个字符图像作为候选字符,制备M×N个单元的第一索引矩阵。由构成第一索引矩阵的第一列的多个候选字符组成的候选字符串,根据语言模型进行词法分析,由此制备具有有意义的字符串的第二索引矩阵。在语言模型中,进行统计,然后进行词法分析。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号