首页> 中文期刊> 《计算机技术与发展》 >基于反馈合并的中英文混排版面OCR技术研究

基于反馈合并的中英文混排版面OCR技术研究

         

摘要

So far,Optical Character Recognition ( OCR) technology has been widely applied in all aspects of social life,and a single char-acter set OCR has made a major breakthrough in the technology field. However,due to the obvious differences between Chinese and Eng-lish layout analysis,the performance of the existing English and Chinese mixed OCR technology is not satisfactory. According to the shortcomings and deficiencies of traditional OCR method,on the basis of the analysis of the segmentation technique difficulties in the study of Chinese and English mixed layout,an improved segmentation method of Chinese and English mixed layout OCR analysis based on feedback merging is proposed. Based on the comprehensive utilization of the Canny operator image binary method and median filter method for filter preprocessing,this method segments the character region twice by projection method,and has conducted the thorough re-search to the specific segmentation techniques. Experiment results show that the proposed method can be successfully separated in mixed document in Chinese,English and numeric characters. The correct rate is higher than the traditional method about 8 percentage points, which can reach 97%,effectively solving the problem of ineffective adhesion character for the traditional methods.%迄今,光学字符识别(OCR)技术已普遍应用于社会生活的方方面面,单一字符集OCR技术领域已经取得重大突破.但由于中文和英文版面分析之间存在的明显差异,现有中英文混排OCR技术的表现均不尽如人意.针对传统OCR方法实现方式的缺点和不足,在研究中英文混合版面分析切分技术难点的基础上,提出了一种改进的基于反馈合并的中英文混合版面分析切分方法.该方法在综合应用Canny算子的图像二值化方法和中值滤波法进行滤波预处理的基础上,采用投影法两次分割字符区域,并对具体切分技巧进行了较为深入的研究.对比验证实验结果表明,所提出的版面分析切分方法可成功分离中英文混合文档中的中文、英文和数字字符,正确率比传统方法高出约8个百分点,可达到97%,较好地解决了传统方法对粘连字符处理效果不佳的问题.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号