首页> 外国专利> Systems and methods for merging word fragments in optical character recognition-extracted data

Systems and methods for merging word fragments in optical character recognition-extracted data

摘要

Systems and methods for merging adjacent word fragments in outputs of optical character recognition (OCR) systems can include a processor obtaining word fragments associated with OCR data generated from an image. Each word fragment can be associated with a respective text line of a plurality of text lines. The at least one processor can determine, for each pair of adjacent word fragments in a text line, a respective normalized horizontal distance between the pair of adjacent word fragments. The processor can identify one or more pairs of adjacent word fragments that are candidates for merging based on the determined normalized horizontal distances. The processor can determine that a pair of adjacent word fragments, among the one or more pairs of adjacent word fragments that are candidates for merging, matches a predefined expression of a plurality of predefined expressions, and merge that pair of adjacent word fragments into a single word.

著录项

相似文献

  • 专利
  • 外文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号