...
首页> 外文期刊>International Journal on Document Analysis and Recognition >Farsi and Arabic document images lossy compression based on the mixed raster content model
【24h】

Farsi and Arabic document images lossy compression based on the mixed raster content model

机译:基于混合栅格内容模型的波斯语和阿拉伯语文档图像有损压缩

获取原文
获取原文并翻译 | 示例
           

摘要

Recently, the mixed raster content model was proposed for compound document image compression. Most state-of-the-art document image compression methods, such as DjVu, work on the basis of this model but they have some disadvantages, especially for Farsi and Arabic document images. First, the Farsi/Arabic script has some characteristics which can be used to further improve the compression performance. Second, existing segmentation methods have focused on well-separating the textual objects from the background and/or optimizing the rate-distortion trade-off; nevertheless, they have not considered the text readability and OCR facility. Third, these methods usually suffer from the unde-sired jaggy artifact and misclassifying the important textual details. In this paper, MRC-based document image compression method is proposed which compromises rate-distortion trade-off better than the existing state-of-the-art document compression methods. The proposed method has higher performance in the aspects of segmentation, bi-level mask layer compression, OCR facility, and the overall compression. It uses a 1D pattern matching technique for compression of mask layer. It also uses a segmentation method which is sensitive enough to the small textual objects. Experimental resultsrnshow that the proposed method has considerably higher compression performance than that of the state-of-the-art compression method DjVu, as high as 1.75-2.3.
机译:最近,提出了混合栅格内容模型用于复合文档图像压缩。大多数最新的文档图像压缩方法(例如DjVu)都在此模型的基础上工作,但是它们有一些缺点,尤其是对于波斯语和阿拉伯语文档图像而言。首先,波斯语/阿拉伯语脚本具有一些特性,可用于进一步提高压缩性能。其次,现有的分割方法集中于将文本对象与背景很好地分离和/或优化速率失真权衡。但是,他们还没有考虑文本的可读性和OCR功能。第三,这些方法通常会遇到不需要的锯齿状伪影,并对重要的文本细节进行错误分类。本文提出了一种基于MRC的文档图像压缩方法,该方法比现有的最新文档压缩方法更好地权衡了速率失真折衷。提出的方法在分割,双层掩模层压缩,OCR设施和整体压缩方面具有更高的性能。它使用一维图案匹配技术压缩掩模层。它还使用一种对小文本对象足够敏感的分割方法。实验结果表明,所提出的方法具有比最新压缩方法DjVu更高的压缩性能,高达1.75-2.3。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号