首页> 外文OA文献 >Visualizing CCITT group 3 and group 4 TIFF documents and transforming to run-length compressed format enabling direct processing in compressed domain
【2h】

Visualizing CCITT group 3 and group 4 TIFF documents and transforming to run-length compressed format enabling direct processing in compressed domain

机译:可视化CCITT第3组和第4组TIFF文档,并转换为行程压缩格式,从而可以在压缩域中直接进行处理

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Compression of data could be thought of as an avenue to overcome Big data problem to a large extent particularly to combat the storage and transmission issues. In this context, documents, images, audios and videos are preferred to be archived and communicated in the compressed form. However, any subsequent operation over the compressed data requires decompression which implies additional computing resources. Therefore developing novel techniques to operate and analyze directly the contents within the compressed data without involving the stage of decompression is a potential research issue. In this context, recently in the literature of Document Image Analysis (DIA) some works have been reported on direct processing of run-length compressed document data specifically targeted on CCITT Group 3 1-D documents. Since, run-length data is the backbone of other advanced compression schemes of CCITT such as CCITT Group 3 2-D (T.4) and CCITT Group 4 2-D (T.6) which are widely supported by TIFF and PDF formats, the proposal in this paper is to intelligently generate the run-length data from theudcompressed data of T.4 and T.6, and thus extend the idea of direct processing of documents in Run-Length Compressed Domain (RLCD). The generated run-length data from the proposed algorithm is experimentally validated and 100% correlation is reported with a data set of compressed documents. In the end, text segmentation and word spotting application in RLCD is also demonstrated.
机译:数据压缩可以被认为是在很大程度上克服大数据问题,特别是解决存储和传输问题的一种途径。在这种情况下,文档,图像,音频和视频最好以压缩形式进行归档和通信。但是,对压缩数据进行的任何后续操作都需要解压缩,这意味着需要更多的计算资源。因此,开发新颖的技术来直接操作和分析压缩数据中的内容而不涉及解压缩阶段是一个潜在的研究问题。在这种情况下,最近在文档图像分析(DIA)的文献中,已经报道了一些针对直接针对CCITT Group 3 1-D文档的行程压缩文档数据进行直接处理的著作。由于运行长度数据是CCITT其他高级压缩方案(例如CCITT组3 2-D(T.4)和CCITT组4 2-D(T.6))的基础,TIFF和PDF格式广泛支持该方案,本文的建议是从T.4和T.6的解压缩数据中智能地生成游程长度数据,从而扩展了在运行长度压缩域(RLCD)中直接处理文档的想法。通过实验验证了所提出算法产生的行程数据,并通过压缩文档数据集报告了100%的相关性。最后,还演示了文本分割和单词点选在RLCD中的应用。

著录项

相似文献

  • 外文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号