【24h】

A Database of Glyphs for OCR of Mathematical Documents

机译:用于数学文档OCR的字形数据库

获取原文
获取原文并翻译 | 示例

摘要

Automatic document analysis tools for mathematical texts are necessary to enlarge the pool of mathematical knowledge available in electronic form. However, development of such tools is currently hindered by the weakness of optical character recognition systems in dealing with the large range of mathematical symbols and the often subtle but important distinctions in font usage in mathematical texts. Research on developing better systems for mathematical optical character recognition crucially depends on having an extensive, high quality database of glyphs used in mathematical texts for training and test purposes. We present such a database of symbols constructed from a large set of characters available in the LATEX document preparation system that can serve as a basis mathematical text recognition. We describe its integration into a prototypical system optical character recognition system for mathematics that enables the construction of LATEX source documents from mathematical documents available as images. From the lessons learned in this work we derive a road map for further research into the area of mathematical text analysis.
机译:为了扩大电子形式可用的数学知识库,必须有用于数学文本的自动文档分析工具。但是,由于光学字符识别系统在处理大范围的数学符号以及数学文本中字体使用中通常微妙但重要的区别方面的弱点,目前阻碍了此类工具的开发。开发更好的数学光学字符识别系统的研究关键取决于拥有广泛的高质量字形数据库,该字形用于数学文本中以进行培训和测试。我们介绍了由LATEX文档准备系统中可用的大量字符构成的符号数据库,可以作为基础的数学文本识别。我们将其描述为集成到用于光学的原型系统光学字符识别系统中,该系统能够从作为图像的数学文档中构建LATEX源文档。从这项工作中吸取的教训,我们得出了进一步研究数学文本分析领域的路线图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号