针对目前光学字符识别技术(OCR)较难实现对中文文献中的数学公式进行识别,提出一种改进算法来解决印刷体内嵌数学公式的识别问题。通过添加新的特征值进行文本行分类,对内嵌公式行按字符逐一分割,再从分类后的文本行中依次提取出数学公式。实验结果表明,该算法具有识别率高、高效特点,与现有同类算法比较,在解决中文印刷体的数学公式识别问题方面的优势明显。%Its difficult for optical character recognition (OCR)technology to recognise the mathematical formulas from Chinese electronic literatures at present.In light of this,we put forward an improved algorithm to solve the recognition problem with regard to the mathematical formula embedded in printed files.It classifies the lines of text by adding new eigenvalue,and segments the embedded formulas line to characters one by one,then extracts the mathematical formulas from the classified text lines in turn.Experimental results show that the new algorithm has the characteristics of high recognition rate and efficiency.Compared with existing similar algorithms,it has clear predominance in solving the problem of mathematical formulas recognition from Chinese prints.
展开▼