首页> 外国专利> Alphabetic character word upper/lower case print convention apparatus and method

Alphabetic character word upper/lower case print convention apparatus and method

机译:字母词大写/小写打印约定装置和方法

摘要

The print convention apparatus and method disclosed herein effects a decision making process with respect to a determination as to whether an alphabetic character field output from an optical character reader (OCR) is related to the OCR scan of an upper case or a lower case inscription on the document scanned. The alphabetic character field (e.g., a word) is comprised of one or a series of alphabetic characters which represent the OCR's interpretation of characters printed on the scanned document. Each word output by the OCR corresponds to a field (i.e., word) of characters imprinted on the scanned document. The electrical signals representative of the upper and lower case alphabetic characters and rejects including conflicts outputted from the OCR are applied to a character occurrence probability storage apparatus which contains precomputed empirical probabilities therein that: (1) a given character recognition is the result of the scan of an upper case character; and (2) a given character recognition is the result of the scan of a lower case character. In addition, the storage apparatus includes probability values for character conflicts and rejects. As the series of alphabetic character signals from the OCR output are applied character-by-character to the character occurrence probability storage apparatus (e.g., a read- only store), a running sum of the respective probabilities for the upper case and lower case print conventions is developed so that, following the input of the final character, reject or conflict within a word to the aforesaid apparatus, an appropriate upper or lower case determination can be made for all of the characters within the word. This determination corresponds with the print convention of the word inscribed on the scanned document. A corresponding upper or lower case flag is correspondingly generated with the print convention determination, and associated with the alphabetic character word output from the print convention apparatus for further text processing. In one embodiment of the invention the probability for each OCR output alphabetic character being an upper or lower case character is stored in respective upper and lower case character occurrence probability storage devices after having been precomputed as the product of two probability factors; i.e., (1) a first probability factor with respect to the likelihood that the OCR recognition resulted from the scan of an upper or lower case character, and (2) a second probability factor with respect to the likelihood of a given character occurring in a specified language (e.g., English) document. In another embodiment of the invention, the character occurrence probability storage devices are functionally replaced by a read-only store having an address position for each upper and lower case alphabetic character outputted by the OCR including conflicts and rejects, and a precomputed numerical probability value associated with each address position to represent the quotient of: (1) the probability that a given character is related to an upper case print convention; and (2) the probability that the same character is related to a lower case print convention.
机译:本文公开的打印约定装置和方法影响关于从光学字符读取器(OCR)输出的字母字符字段是否与大写或小写题字的OCR扫描有关的确定的决策过程。扫描的文档。字母字符字段(例如单词)由一个或一系列字母字符组成,这些字符代表OCR对打印在扫描文档上的字符的解释。 OCR输出的每个单词都对应于打印在扫描文档上的字符的字段(即单词)。代表大写和小写字母字符以及包括从OCR输出的冲突在内的不合格品的电信号被施加到字符出现概率存储设备中,该设备在其中包含预先计算的经验概率:(1)给定的字符识别是扫描的结果具有大写字母; (2)给定的字符识别是小写字符扫描的结果。另外,存储设备包括用于字符冲突和拒绝的概率值。由于将来自OCR输出的一系列字母字符信号逐个字符地应用于字符出现概率存储设备(例如,只读存储),因此大写和小写打印的各个概率的和制定惯例,以便在输入最终字符,拒绝单词或将单词内的冲突输入到前述设备之后,可以对该单词内的所有字符进行适当的大写或小写确定。该确定与刻在扫描文档上的单词的打印约定相对应。相应的大写或小写标志与打印约定确定相应地生成,并且与从打印约定设备输出的字母字符词相关联以用于进一步的文本处理。在本发明的一个实施例中,在每个OCR输出字母字符是大写或小写字符的概率被预先计算为两个概率因子的乘积之后,被存储在相应的大写和小写字符出现概率存储装置中。即,(1)关于OCR识别是由大写或小写字符的扫描引起的可能性的第一概率因子,以及(2)关于给定字符出现在OCR中的可能性的第二概率因子。指定的语言(例如英语)文档。在本发明的另一个实施例中,字符出现概率存储设备在功能上被只读存储器代替,该只读存储器具有由OCR输出的每个大写和小写字母字符的地址位置,包括冲突和拒绝,以及预先计算的数值概率值。每个地址位置均表示以下商数:(1)给定字符与大写打印约定相关的概率; (2)相同字符与小写字母打印约定相关的可能性。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号