首页> 外文期刊>Procedia Computer Science >Visualizing CCITT Group 3 and Group 4 TIFF Documents and Transforming to Run-Length Compressed Format Enabling Direct Processing in Compressed Domain
【24h】

Visualizing CCITT Group 3 and Group 4 TIFF Documents and Transforming to Run-Length Compressed Format Enabling Direct Processing in Compressed Domain

机译:可视化CCITT第3组和第4组TIFF文档,并转换为可在压缩域中直接处理的运行时压缩格式

获取原文
           

摘要

Compression of data could be thought of as an avenue to overcome Big data problem to a large extent particularly to combat the storage and transmission issues. In this context, documents, images, audios and videos are preferred to be archived and communicated in the compressed form. However, any subsequent operation over the compressed data requires decompression which implies additional computing resources. Therefore developing novel techniques to operate and analyze directly the contents within the compressed data without involving the stage of decompression is a potential research issue. In this context, recently in the literature of Document Image Analysis (DIA) some works have been reported on direct processing of run-length compressed document data specifically targeted on CCITT Group 3 1-D documents. Since, run-length data is the backbone of other advanced compression schemes of CCITT such as CCITT Group 3 2-D (T.4) and CCITT Group 4 2-D (T.6) which are widely supported by TIFF and PDF formats, the proposal in this paper is to intelligently generate the run-length data from the compressed data of T.4 and T.6, and thus extend the idea of direct processing of documents in Run-Length Compressed Domain (RLCD). The generated run-length data from the proposed algorithm is experimentally validated and 100% correlation is reported with a data set of compressed documents. In the end, text segmentation and word spotting application in RLCD is also demonstrated.
机译:数据压缩可以被认为是在很大程度上克服大数据问题,特别是解决存储和传输问题的一种途径。在这种情况下,文档,图像,音频和视频最好以压缩形式进行归档和通信。但是,对压缩数据进行的任何后续操作都需要解压缩,这意味着需要更多的计算资源。因此,开发新颖的技术来直接操作和分析压缩数据中的内容而不涉及解压缩阶段是一个潜在的研究问题。在这种情况下,最近在文档图像分析(DIA)的文献中,已经报道了一些针对直接针对CCITT Group 3 1-D文档的行程压缩文档数据进行直接处理的著作。由于运行长度数据是CCITT其他高级压缩方案(如CCITT组3 2-D(T.4)和CCITT组4 2-D(T.6))的骨干,TIFF和PDF格式广泛支持这种压缩方案,本文的建议是从T.4和T.6的压缩数据智能地生成行程长度数据,从而扩展了在行程长度压缩域(RLCD)中直接处理文档的想法。通过实验验证了所提出算法产生的行程数据,并通过压缩文档数据集报告了100%的相关性。最后,还演示了文本分割和单词点选在RLCD中的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号