Text Segmentation for MRC Document Compression

Haneda E.; Bouman C. A.

首页> 外文期刊>Image Processing, IEEE Transactions on >Text Segmentation for MRC Document Compression

【24h】

Text Segmentation for MRC Document Compression

机译：用于MRC文档压缩的文本分割

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The mixed raster content (MRC) standard (ITU-T T.44) specifies a framework for document compression which can dramatically improve the compression/quality tradeoff as compared to traditional lossy image compression algorithms. The key to MRC compression is the separation of the document into foreground and background layers, represented as a binary mask. Therefore, the resulting quality and compression ratio of a MRC document encoder is highly dependent upon the segmentation algorithm used to compute the binary mask. In this paper, we propose a novel multiscale segmentation scheme for MRC document encoding based upon the sequential application of two algorithms. The first algorithm, cost optimized segmentation (COS), is a blockwise segmentation algorithm formulated in a global cost optimization framework. The second algorithm, connected component classification (CCC), refines the initial segmentation by classifying feature vectors of connected components using an Markov random field (MRF) model. The combined COS/CCC segmentation algorithms are then incorporated into a multiscale framework in order to improve the segmentation accuracy of text with varying size. In comparisons to state-of-the-art commercial MRC products and selected segmentation algorithms in the literature, we show that the new algorithm achieves greater accuracy of text detection but with a lower false detection rate of nontext features. We also demonstrate that the proposed segmentation algorithm can improve the quality of decoded documents while simultaneously lowering the bit rate.

机译：混合栅格内容（MRC）标准（ITU-T T.44）指定了文档压缩的框架，与传统的有损图像压缩算法相比，该框架可以显着改善压缩/质量折衷。 MRC压缩的关键是将文档分为前景层和背景层，以二进制掩码表示。因此，MRC文档编码器的最终质量和压缩率高度依赖于用于计算二进制掩码的分割算法。在本文中，我们基于两种算法的顺序应用，提出了一种新颖的MRC文档编码多尺度分割方案。第一种算法是成本优化分段（COS），是在全局成本优化框架中制定的逐块分段算法。第二种算法，连接组件分类（CCC），通过使用马尔可夫随机场（MRF）模型对连接组件的特征向量进行分类来细化初始分割。然后将组合的COS / CCC分割算法合并到一个多尺度框架中，以提高大小可变的文本的分割精度。与最先进的商业MRC产品和文献中选择的分割算法进行比较，我们发现新算法可实现更高的文本检测准确性，但对非文本特征的错误检测率较低。我们还证明了所提出的分割算法可以提高解码文档的质量，同时降低比特率。

著录项

来源
《Image Processing, IEEE Transactions on》 |2011年第6期|p.1611-1626|共16页
作者
Haneda E.; Bouman C. A.;
展开▼
作者单位

School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Document compression; MRC compression; Markov random fields; Multiscale image analysis; image segmentation;

机译：文档压缩;MRC压缩;马尔可夫随机场;多尺度图像分析;图像分割;

相似文献

外文文献
中文文献
专利

1. Test Segmentation of MRC Document Compression and Decompression by Using MATLAB [J] . N.Rajeswari, S.Rathnapriya, S.Nij, International Journal of Innovative Research in Science, Engineering and Technology . 2014,第1期

机译：使用MATLAB测试MRC文档压缩和解压缩的分段
2. WAVELET-BASED IMAGES COMPRESSION OF COLOR DOCUMENT BY FUZZY PICTURE-TEXT SEGMENTATION [J] . Bing-Fei Wu, Chung-Cheng Chiu, Wen-Long Lin Journal of the Chinese Institute of Engineers . 2003,第1期

机译：模糊图像-文本分割的彩色图像小波图像压缩
3. Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents [J] . Alireza Alaei, P. Nagabhushan, Umapada Pal Pattern Analysis and Applications . 2011,第4期

机译：分段绘画技术用于无约束手写文本的行分割：波斯文本文档的一项特殊研究
4. Multiscale segmentation for MRC document compression using a Markov random field model [C] . Haneda, Eri, Bouman, Charles A. IEEE International Conference on Acoustics Speech and Signal;ICASSP 2010 . 2010

机译：使用Markov随机场模型进行MRC文档压缩的多尺度分割。
5. Document image analysis techniques for handwritten text segmentation, document image rectification and digital collation. [D] . Salvi, Dhaval. 2014

机译：用于手写文本分割，文档图像校正和数字整理的文档图像分析技术。
6. Texts and documents. Translation and analysis of a cuneiform text forming part of a Babylonian treatise on epilepsy. [O] . J V Wilson, E H Reynolds 1990

机译：文本和文件。翻译和分析楔形文字形成癫痫病的巴比伦论文的一部分。
7. MRC Compression of Compound Documents Using Threshold Segmentation, Iterative Data-filling and H.264/AVC-INTRA [O] . A. Zaghetto, R. L. De Queiroz, Braśılia Df Brasil, 2014

机译：使用阈值分割，迭代数据填充和H.264 / aVC-INTRa的复合文档的mRC压缩
8. Script-Independent Text Line Segmentation in Freestyle Handwritten Documents [R] . Li, Y. , Zheng, Y. , Doermann, D. , 2006

机译：自由式手写文档中与脚本无关的文本行分割

Text Segmentation for MRC Document Compression

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅