Abstract: In this paper, we describe a feature based supervised zone classifier using only the knowledge of the widths and the heights of the connected-components within a given zone. The distribution of the widths and the heights of the connected-components is encoded into a n multiplied by m dimensional vector in the decision making. Thus, the computational complexity is in the order of the number of connected-components within the given zone. A binary decision tree is used to assign a zone class on the basis of its feature vector. The training and testing data sets for the algorithm are drawn from the scientific document pages in the UW-I database. The classifier is able to classify each given scientific and technical document zone into one of the eight labels: text of font size 8-12, text of font size 13-18, text of font size 19-36, display math, table, halftone, line drawing, and ruling, in real time. The classifier is able to discriminate text from non-text with an accuracy greater than 97%. !8
展开▼