首页> 外文会议>2015 Eighth International Conference on Advances in Pattern Recognition >Automated segmentation and classification of chemical and other equations from document images
【24h】

Automated segmentation and classification of chemical and other equations from document images

机译:从文档图像中自动对化学方程式和其他方程式进行分割和分类

获取原文
获取原文并翻译 | 示例

摘要

Segmentation of mathematical equations from document images is already a major research area for improved performance of OCR systems. Though chemical equations are also sharing similar spatial properties as that of non-chemical equations (for example, mathematical equations), efforts to segment those are still to be explored. This paper presents a novel method for segmenting and identifying chemical and any other equations in heterogeneous document images that may contain graphics, tables, text and the classifying them into two categories; chemical and non-chemical equations. This study, a first of its kind, as far our knowledge goes, not only improves the OCR performance, but also leads to creation of chemical database and formation of bond electron matrix from chemical equations or formulae. In our proposed method we extracted the equations using morphological operators and histogram analysis and the extracted equations are classified using an open source OCR engine. The effectiveness of the proposed method is demonstrated by testing it on 152 document images. Test results show an accuracy of 97.4% and 97.45% for segmentation and classification, respectively.
机译:从文档图像中分割数学方程式已经是提高OCR系统性能的主要研究领域。尽管化学方程式也具有与非化学方程式(例如,数学方程式)相似的空间特性,但仍需探索将这些化学方程式分割的方法。本文提出了一种新颖的方法来分割和识别异构文档图像中的化学物质和任何其他方程式,其中可能包含图形,表格,文本并将它们分为两类。化学和非化学方程式。就我们所知,这项研究尚属首次,它不仅可以提高OCR性能,而且可以创建化学数据库并根据化学方程式或化学式形成键电子矩阵。在我们提出的方法中,我们使用形态学算子和直方图分析提取了方程,并使用开源OCR引擎对提取的方程进行了分类。通过在152个文档图像上进行测试,证明了该方法的有效性。测试结果显示,分割和分类的准确率分别为97.4%和97.45%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号