首页> 外文会议>Document Recognition III >Progress in recognizing typeset mathematics
【24h】

Progress in recognizing typeset mathematics

机译:排版数学的认识进展

获取原文

摘要

Abstract: Printed mathematics has a number of features which distinguish it from conventional text. These include structure in two dimensions (fractions, exponents, limits), frequent font changes, symbols with variable shape (quotient bars), and substantially differing notational conventions from source to source. When compounded with more generic problems such as noise and merged or broken characters, printed mathematics offers a challenging arena for recognition. Our project was initially driven by the goal of scanning and parsing some 5,000 pages of elaborate mathematics (tables of definite integrals). While our prototype system demonstrates success on translating noise-free typeset equations into Lisp expressions appropriate for further processing, a more semantic top-down approach appears necessary for higher levels of performance. Such an approach may benefit the incorporation of these programs into a more general document processing viewpoint. We intend to release to the public our somewhat refined prototypes as utility programs in the hope that they will be of general use in the construction of custom OCR packages. These utilities are quite fast even as originally prototyped in Lisp, where they may be of particular interest to those working on 'intelligent' optical processing. Some routines have been re-written in C$PLU$PLU as well. Additional programs providing formula recognition and parsing also form a part of this system. It is important however to realize that distinct conflicting grammars are needed to cover variations in contemporary and historical typesetting, and thus a single simple solution is not possible. !11
机译:摘要:印刷数学具有许多与常规文本不同的特征。这些包括二维结构(分数,指数,限制),频繁的字体更改,形状可变的符号(商数条)以及源与源之间存在显着不同的符号约定。当再加上诸如噪音,字符合并或损坏等更常见的问题时,印刷数学将为人们提供一个具有挑战性的领域。我们项目的最初目标是扫描和解析约5,000页详尽的数学(定积分表)。虽然我们的原型系统展示了将无噪声的排版方程式转换为适合进一步处理的Lisp表达式的成功,但对于更高的性能水平,似乎更需要语义自上而下的方法。这样的方法可能有益于将这些程序合并到更一般的文档处理观点中。我们打算将一些经过精炼的原型作为实用程序发布给公众,希望它们将在定制OCR软件包的构建中普遍使用。这些实用程序的运行速度相当快,即使它们最初是在Lisp中原型制作的,也可能对从事“智能”光学处理的人们特别感兴趣。一些例程也已用C $ PLU $ PLU重写。提供公式识别和解析的其他程序也构成了该系统的一部分。然而,重要的是要认识到,需要不同的冲突语法来涵盖当代和历史排版的变体,因此不可能有一个简单的解决方案。 !11

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号