首页> 中文期刊> 《计算机应用与软件》 >一种基于英文字符的斜体检测方法

一种基于英文字符的斜体检测方法

         

摘要

英文文档中往往使用斜体字符来突出和强调内容的重要性,而在光学字符识别(OCR)系统中,由于训练的样本中并没有包括斜体字符,导致系统无法正确识别出斜体字符。如果将斜体字符加入训练的样本中,则加大了样本的复杂度,对正体字符的识别也会产生一定的影响。针对这个现象,提出一种英文斜体字检测和纠正的方法。首先将文本行分割成单词,并进一步细分为单个字符,然后分别检测各个字符的形态特征,并依此判断出单词的形态。最后收集检测为斜体结果的所有单词,并利用这些单词计算出斜体字符的准确角度并加以纠正。经试验结果证明,该方法能取得很好的检测和纠正效果。%Italic characters are often used to highlight and emphasise the importance of content in English documentation.However in optical character recognition (OCR)system,due to the training sample does not include italic characters,the system can not correctly identify italic characters.If the italic characters are included in training sample,it will increase the complexity of the sample and also have some impact on the recognition of positive body characters.For this phenomenon,we present a method of detecting and correcting English italics.The first step is to split lines of text into words,and further to subdivide the words into individual characters;then it detects the morphological features of each character and determines the word shape accordingly.Finally all the words in italics as the detection results are collected,and they will be used to calculate the accurate angle of italic characters for correction.Test results show that this method can achieve good detection and correction effect.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号