Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document for further processing. A pre-processing step of document analysis methods is a skew estimation of scanned or photographed documents. Current skew estimation methods require the existence of large text areas, are dependent on the text type and can be limited on a specific angle range. The proposed method is gradient based in combination with a Focused Nearest Neighbor Clustering of interest points and has no limitations regarding the detectable angle range. The upside/down decision is based on statistical analysis of ascenders and descenders. It can be applied to entire documents as well as to document fragments containing only a few words. Results show that the proposed skew estimation is comparable with state-of-the-art methods and outperforms them on a real dataset consisting of 658 snippets.
展开▼