首页> 外国专利> Method and device for generating a fuzzy rule base for classifying logical structure features of printed documents

Method and device for generating a fuzzy rule base for classifying logical structure features of printed documents

机译:用于对印刷文档的逻辑结构特征进行分类的模糊规则库的生成方法和装置

摘要

In a first step, character recognition features are provided from a certain printed document. In a second step, a number of physical structure features is determined on the basis of the provided character recognition features. This second step is done for each line of the certain printed document. In a third step, training data including an input-output sample are provided, wherein the input is represented by the number of physical structure features and the output is represented by a manually labelled logical structure feature. This third step is done for each line of the certain printed document. In a fourth step, a distribution for each physical structure feature in the certain printed document is determined. In a fifth step, a fuzzy set having linguistic variables and corresponding membership degrees is provided on the basis of the respective calculated distribution; This fifth step is done for each line and for each physical structure feature. In a sixth step, for each line and for each physical structure feature, selecting the linguistic variable with the maximum membership degree. This sixth step is done for each line and for each physical structure feature. In a seventh step, a fuzzy rule for the fuzzy rule base is generated on the basis of the input-output frame, wherein the respective physical structure feature of the input is represented by the corresponding selected linguistic variable with its membership degree and the output is represented by the manually labelled logical structure feature. This seventh step is done for each line.
机译:第一步,从某个打印文档中提供字符识别功能。在第二步骤中,基于所提供的字符识别特征来确定多个物理结构特征。对于特定打印文档的每一行完成第二步。在第三步骤中,提供包括输入-输出样本的训练数据,其中,输入由物理结构特征的数量表示,而输出由人工标记的逻辑结构特征表示。第三步是对某些打印文档的每一行进行的。在第四步骤中,确定特定印刷文档中每个物理结构特征的分布。在第五步骤中,基于相应的计算分布,提供具有语言变量和相应隶属度的模糊集。为每条线和每个物理结构特征完成此第五步。在第六步中,对于每条线和每个物理结构特征,选择具有最大隶属度的语言变量。第六步针对每条线和每个物理结构特征完成。在第七步骤中,基于输入-输出框架生成用于模糊规则库的模糊规则,其中,输入的相应物理结构特征由具有其隶属度的相应选择的语言变量表示,并且输出为由手动标记的逻辑结构特征表示。对每一行完成第七步。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号