首页> 外文会议>Document Recognition II >Text/graphics separation for technical papers
【24h】

Text/graphics separation for technical papers

机译:技术论文的文字/图形分离

获取原文

摘要

Abstract: One of the important operations in automatic analysis of technical papers is a text separation from graphics. In practice, a document skew often occurs both for initial document and for its image after scanning. Also text and graphic blocks can exist which have no rectangular shape. In these cases, the standard text/graphics separation methods such as projection profiles or run length smoothing are not always suitable. In this paper, we propose the text/graphics separation algorithm based on two simple and standard properties of technical paper pages. We call them as area and text compactness properties. The area property takes into account the geometrical relationships between text and graphics. The text compactness property reflects the spatial relationships between text components within block and between text and graphics. An application of both properties allows us to accurately perform the separation in the cases above. No skew correction is required before separation and text and graphic blocks can have arbitrary shape. !9
机译:摘要:技术文件自动分析中的重要操作之一是将文本与图形分离。实际上,原始文档及其扫描后的图像经常会出现文档歪斜。也可以存在没有矩形的文本和图形块。在这些情况下,标准的文本/图形分离方法(例如投影轮廓或行程长度平滑)并不总是适合的。在本文中,我们基于技术论文页面的两个简单和标准属性,提出了文本/图形分离算法。我们称它们为区域和文本紧凑性属性。 area属性考虑了文本和图形之间的几何关系。文本紧凑性属性反映了块内文本组件之间以及文本与图形之间的空间关系。两种属性的应用使我们能够在上述情况下准确地执行分离。分离之前不需要歪斜校正,并且文本和图形块可以具有任意形状。 !9

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号