首页> 外文期刊>Image Processing, IEEE Transactions on >Text Segmentation for MRC Document Compression
【24h】

Text Segmentation for MRC Document Compression

机译:用于MRC文档压缩的文本分割

获取原文
获取原文并翻译 | 示例
       

摘要

The mixed raster content (MRC) standard (ITU-T T.44) specifies a framework for document compression which can dramatically improve the compression/quality tradeoff as compared to traditional lossy image compression algorithms. The key to MRC compression is the separation of the document into foreground and background layers, represented as a binary mask. Therefore, the resulting quality and compression ratio of a MRC document encoder is highly dependent upon the segmentation algorithm used to compute the binary mask. In this paper, we propose a novel multiscale segmentation scheme for MRC document encoding based upon the sequential application of two algorithms. The first algorithm, cost optimized segmentation (COS), is a blockwise segmentation algorithm formulated in a global cost optimization framework. The second algorithm, connected component classification (CCC), refines the initial segmentation by classifying feature vectors of connected components using an Markov random field (MRF) model. The combined COS/CCC segmentation algorithms are then incorporated into a multiscale framework in order to improve the segmentation accuracy of text with varying size. In comparisons to state-of-the-art commercial MRC products and selected segmentation algorithms in the literature, we show that the new algorithm achieves greater accuracy of text detection but with a lower false detection rate of nontext features. We also demonstrate that the proposed segmentation algorithm can improve the quality of decoded documents while simultaneously lowering the bit rate.
机译:混合栅格内容(MRC)标准(ITU-T T.44)指定了文档压缩的框架,与传统的有损图像压缩算法相比,该框架可以显着改善压缩/质量折衷。 MRC压缩的关键是将文档分为前景层和背景层,以二进制掩码表示。因此,MRC文档编码器的最终质量和压缩率高度依赖于用于计算二进制掩码的分割算法。在本文中,我们基于两种算法的顺序应用,提出了一种新颖的MRC文档编码多尺度分割方案。第一种算法是成本优化分段(COS),是在全局成本优化框架中制定的逐块分段算法。第二种算法,连接组件分类(CCC),通过使用马尔可夫随机场(MRF)模型对连接组件的特征向量进行分类来细化初始分割。然后将组合的COS / CCC分割算法合并到一个多尺度框架中,以提高大小可变的文本的分割精度。与最先进的商业MRC产品和文献中选择的分割算法进行比较,我们发现新算法可实现更高的文本检测准确性,但对非文本特征的错误检测率较低。我们还证明了所提出的分割算法可以提高解码文档的质量,同时降低比特率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号