...
首页> 外文期刊>Pattern recognition letters >An adaptive over-split and merge algorithm for page segmentation
【24h】

An adaptive over-split and merge algorithm for page segmentation

机译:用于页面分割的自适应过分合并算法

获取原文
获取原文并翻译 | 示例
           

摘要

Page segmentation is a key step in building a document recognition system. Variation in character font sizes, narrow spacing between text blocks, and complicated structure are main causes of the most common over-segmentation and under-segmentation errors. We propose an adaptive over-split and merge algorithm to reduce simultaneously these types of error. The document image is firstly over-split into text blocks, even text lines. These text blocks are then considered to merge into text regions using a new adaptive thresholding method. Local context analysis uses a set of text line separators to split homogeneous text regions of similar font size and close text blocks into paragraphs. Experiments on the ICDAR2009 and UW-III benchmarking datasets show the effectiveness of the proposed algorithm in reducing both the under and over-segmentation errors and boost the performance significantly when comparing with popular page segmentation algorithms. (C) 2016 Elsevier B.V. All rights reserved.
机译:页面分割是构建文档识别系统的关键步骤。字符字体大小的变化,文本块之间的狭窄间距以及复杂的结构是最常见的过度分割和分割不足错误的主要原因。我们提出了一种自适应过度分割和合并算法,以同时减少这些类型的错误。首先将文档图像过度分割为文本块,甚至是文本行。然后考虑使用新的自适应阈值化方法将这些文本块合并到文本区域中。本地上下文分析使用一组文本行分隔符将字体大小相似的同质文本区域拆分为多个文本段,并将文本块封闭为段落。在ICDAR2009和UW-III基准数据集上进行的实验表明,与流行的页面分割算法相比,该算法在减少分割不足和分割错误方面均有效,并且可以显着提高性能。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号