首页> 外国专利> METHOD AND SYSTEM FOR DIGITALIZING A LARGE VOLUME OF DOCUMENTS BASED ON CHARACTER RECOGNITION WITH ADAPTIVE TRAINING MODULE TO REAL DATA

METHOD AND SYSTEM FOR DIGITALIZING A LARGE VOLUME OF DOCUMENTS BASED ON CHARACTER RECOGNITION WITH ADAPTIVE TRAINING MODULE TO REAL DATA

机译:基于特征识别和自适应训练模块对真实数据进行大批量文档数字化的方法和系统

摘要

The present invention relates to automatic generation of a representative pattern models with the character recognition engine to enable the present invention relates to the efficient construction of the character recognized by the digitization of the documents, in particular large and adaptive learning of the actual data in the various document digitization process.; The present invention as described above includes the steps of extracting the frequency of appearance for the text pattern contained in the digitized target document from the document data in which the structure and the text information of the document image; Dividing the individual image of each character pattern using a document structure and division information; Extracting a statistical feature of each character pattern image based on the individual image segmentation information; It includes the step of providing to compare patterns of characters to be input to generate a representative model for each character pattern using the statistical feature. Therefore, the character recognizing engine of the document digitizing system according to the present invention can be provided by the new pattern models representative of the actual data to maximize the performance.
机译:本发明涉及利用字符识别引擎自动生成代表性图案模型,以使本发明涉及通过文档数字化识别的字符的有效构造,特别是涉及对文件中实际数据的大量自适应学习。各种文档数字化过程。如上所述的本发明包括以下步骤:从其中文档图像的结构和文本信息的文档数据中提取数字化目标文档中包含的文本图案的出现频率;使用文档结构和划分信息划分每个字符图案的单个图像;根据个体图像分割信息提取每个字符图案图像的统计特征;它包括提供比较要输入的字符模式以使用统计特征为每个字符模式生成代表模型的步骤。因此,可以通过代表实际数据的新模式模型来提供根据本发明的文档数字化系统的字符识别引擎,以使性能最大化。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号