首页> 外国专利> Method of automatic language identification for multi-lingual text recognition

Method of automatic language identification for multi-lingual text recognition

机译:用于多语言文本识别的自动语言识别方法

摘要

The disclosed invention utilizes a complex estimation-based approach to identify languages of portions of a multi-lingual text, recognized from a bit-mapped image. The method comprises besides the traditional steps like the document segmentation, new ones such as generating and testing of a hypothesis about the characters in the word tokens. ;The method further includes definition of selected language models set, word estimation via language models, dictionaries set definition for language selection, estimation of word correspondence with chosen languages, calculating a complex estimation for the word taking into account the most or all of above mentioned estimations. ;The complex estimation may also include factor of characters and/or words mutual correspondence within the line and/or the text, mutual geometric correspondence of characters within the word and/or the line, linguistic correspondence of the word with neighbors, estimation of image of word token reconstruction accuracy in the presence of distortion.
机译:所公开的发明利用基于复杂估计的方法来识别从位图图像识别的多语言文本的各部分的语言。该方法除了包括传统的步骤(如文档分段)外,还包括新的步骤(如生成和测试关于单词令牌中字符的假设)。 ;该方法进一步包括定义所选语言模型集,通过语言模型进行单词估计,用于语言选择的字典集定义,估计与所选语言相对应的单词,考虑上述大部分或全部内容来计算单词的复杂估计估计。 ;复杂的估计还可以包括线和/或文本内字符和/或单词相互对应的因素,字和/或线内字符之间的相互几何对应,单词与邻居之间的语言对应,图像估计失真的情况下词令牌重建精度的估计。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号