A document processing system combining image segmentation with content-based document compression is proposed in the paper. Firstly, a grayscale document image is divided into small blocks and analysed. Then, a modified logical thresholding method based on, local structure analysis and the adaptive logical level technique is used to transform the grayscale document into a binary image. We extract all patterns from the binary document and use a multistage matching method to extract representative patterns. A decomposition method is used to deal with relatively large patterns. Finally, high ratio compression is achieved by coding the relative positions of symbols, extracted representative patterns and other decomposed patterns using the adaptive arithmetic coder anal Q-Coder respectively.
展开▼