首页> 外文会议>International Conference on Document Analysis and Recognition >Exploiting Stroke Orientation for CRF based Binarization of Historical Documents
【24h】

Exploiting Stroke Orientation for CRF based Binarization of Historical Documents

机译:利用基于CRF的历史文献二值化的行程方向

获取原文

摘要

We present a novel binarization method that is especially effective on historical documents with the following characteristics: (a) the documents contain free-form cursive handwritten text with significant but consistent slant, (b) scanning artifacts resulting in the text and background pixels not having uniform intensity even within the same page, and (c) pages containing significant amount of bleeds from the other side of the page. In order to tackle the problem of non-uniform text and background intensity, we use a thresholding algorithm that works equally well for regions of the page containing text and regions of the page containing no text. We then combine this algorithm with a CRF-based framework which handles bleeds using a novel approach to further improve the quality of binarization. We compare the proposed binarization algorithm against other popular binarization algorithms both qualitatively using examples and quantitatively using the word error rate (WER) metric from performing optical character recognition (OCR) on binarized text using the BBN Byblos Offline Handwritten text recognition (OHR) system.
机译:我们提出了一种新的二值化方法,在具有以下特征的历史文档上特别有效:(a)文件包含自由形式的草书手写文本,其具有重要但一致的斜率,(b)扫描伪像,导致文本和背景像素不具有即使在同一页面内的均匀强度,(c)页面的页面也包含来自页面另一侧的大量出血。为了解决非统一文本和背景强度的问题,我们使用一个阈值处理算法,该算法同样适用于包含没有文本的页面的文本和区域的页面的区域。然后,我们将这种算法与基于CRF的框架相结合,该算法使用新的方法处理出血,以进一步提高二值化的质量。我们将提出的二值化算法与其他流行的二值化算法进行了定性使用示例,并使用单词误差率(WER)度量使用BBN Byblos脱机手写文本识别(OHR)系统来使用单词错误率(WER)度量来定量使用单词错误率(WER)度量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号