Presents an edge-based block segmentation and classification with automatic character string extraction for document analysis. By exploiting only four edge features from the gradient and the orientation of the edge pixels, we can make the block segmentations, classifications, and the character string extractions all insensitive to the background noise and the brightness variation of the image. We can efficiently classify a document image into seven categories of small-sized letters, large-sized letters, tables, equations, flow charts, graphs, and photographs, the first five of which are text or character blocks containing characters, and the last two are non-character blocks. We can obtain an efficient block segmentation with reduced memory size by introducing the column and the text line intervals of the document in CRLA (constrained run length algorithm). The simulation results show that an efficient document image segmentation, block classification, and the character string extraction can be done concurrently.
展开▼