Farsi and Arabic document images lossy compression based on the mixed raster content model

Hadi Grailu; Mojtaba Lotfizad; Hadi Sadoghi-Yazdi

首页> 外文期刊>International Journal on Document Analysis and Recognition >Farsi and Arabic document images lossy compression based on the mixed raster content model

【24h】

Farsi and Arabic document images lossy compression based on the mixed raster content model

机译：基于混合栅格内容模型的波斯语和阿拉伯语文档图像有损压缩

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, the mixed raster content model was proposed for compound document image compression. Most state-of-the-art document image compression methods, such as DjVu, work on the basis of this model but they have some disadvantages, especially for Farsi and Arabic document images. First, the Farsi/Arabic script has some characteristics which can be used to further improve the compression performance. Second, existing segmentation methods have focused on well-separating the textual objects from the background and/or optimizing the rate-distortion trade-off; nevertheless, they have not considered the text readability and OCR facility. Third, these methods usually suffer from the unde-sired jaggy artifact and misclassifying the important textual details. In this paper, MRC-based document image compression method is proposed which compromises rate-distortion trade-off better than the existing state-of-the-art document compression methods. The proposed method has higher performance in the aspects of segmentation, bi-level mask layer compression, OCR facility, and the overall compression. It uses a 1D pattern matching technique for compression of mask layer. It also uses a segmentation method which is sensitive enough to the small textual objects. Experimental resultsrnshow that the proposed method has considerably higher compression performance than that of the state-of-the-art compression method DjVu, as high as 1.75-2.3.

机译：最近，提出了混合栅格内容模型用于复合文档图像压缩。大多数最新的文档图像压缩方法（例如DjVu）都在此模型的基础上工作，但是它们有一些缺点，尤其是对于波斯语和阿拉伯语文档图像而言。首先，波斯语/阿拉伯语脚本具有一些特性，可用于进一步提高压缩性能。其次，现有的分割方法集中于将文本对象与背景很好地分离和/或优化速率失真权衡。但是，他们还没有考虑文本的可读性和OCR功能。第三，这些方法通常会遇到不需要的锯齿状伪影，并对重要的文本细节进行错误分类。本文提出了一种基于MRC的文档图像压缩方法，该方法比现有的最新文档压缩方法更好地权衡了速率失真折衷。提出的方法在分割，双层掩模层压缩，OCR设施和整体压缩方面具有更高的性能。它使用一维图案匹配技术压缩掩模层。它还使用一种对小文本对象足够敏感的分割方法。实验结果表明，所提出的方法具有比最新压缩方法DjVu更高的压缩性能，高达1.75-2.3。

著录项

来源
《International Journal on Document Analysis and Recognition》 |2009年第4期|227-248|共22页
作者
Hadi Grailu; Mojtaba Lotfizad; Hadi Sadoghi-Yazdi;
展开▼
作者单位

Department of Electrical Engineering, Tarbiat Modares University, Tehran, Iran;

Department of Electrical Engineering, Tarbiat Modares University, Tehran, Iran;

Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
document image compression; bi-level textual image compression; document segmentation; MRC model; OCR facility;

机译：文档图像压缩;双层文本图像压缩;文件分割MRC模型;OCR设施;

相似文献

外文文献
中文文献
专利

1. An improved pattern matching technique for lossy/lossless compression of binary printed Farsi and Arabic textual images [J] . Hadi Grailu, Mojtaba Lotfizad, Hadi Sadoghi-Yazdi International Journal of Intelligent Computing and Cybernetics . 2009,第1期

机译：一种改进的模式匹配技术，可对二进制印刷的波斯语和阿拉伯语文本图像进行无损/无损压缩
2. Content-Based Document Image Retrieval Based on Document Modeling [J] . Shiah Chwan-Yi Journal of Intelligent Information Systems . 2020,第2期

机译：基于内容的文档图像检索基于文档建模
3. Content-lossless document image compression based on structural analysis and pattern matching [J] . Yang YB., Yu DG., Yan H. Pattern Recognition: The Journal of the Pattern Recognition Society . 2000,第8期

机译：基于结构分析和模式匹配的无损文档图像压缩
4. Mixed Raster Content (MRC) Model for Compound Image Compression [C] . Ricardo de Queiroz, Robert Buckley, Ming Xu Conference on visual communications and image processing . 1998

机译：用于复合图像压缩的混合栅格内容（MRC）模型
5. Memory-efficient algorithms for raster document image compression. [D] . Figuera Alegre, Maribel. 2008

机译：光栅文档图像压缩的内存有效算法。
6. Efficient Lossy Compression for Compressive Sensing Acquisition of Images in Compressive Sensing Imaging Systems [O] . Xiangwei Li, Xuguang Lan, Meng Yang, 2014

机译：压缩感知成像系统中图像的压缩感知采集的有效有损压缩
7. Mixed raster content (MRC) model for compound image compression [O] . Ricardo De Queiroz, Robert Buckley, Ming Xu 1999

机译：用于复合图像压缩的混合栅格内容（mRC）模型

Farsi and Arabic document images lossy compression based on the mixed raster content model

摘要

著录项

相似文献

相关主题

期刊订阅