Locations of word images corresponding to words in a document image are ascertained. The word images are grouped into clusters. For each of multiple of the clusters, a respective compressed word image cluster is determined based on a joint compression of respective ones of the word images that are grouped into the cluster. The positions of the word images in the document image are associated with the respective ones of the compressed word image clusters corresponding to the clusters respectively containing the word images.
展开▼