Abstract: Recently there has been an increased interest in document image skew detection algorithms. Most of the papers relevant to this problem include some experimental results. However, there exists a lack of a universally accepted methodology for evaluating the performance of such algorithms. We have implemented four types of skew detection algorithms in order to investigate possible testing methodologies. We then tested each algorithm on a sample of 460 page images randomly selected from a collection of approximately 100,000 pages. This collection contains a wide variety of typographical features and styles. In our evaluation we examine several issues relevant to the establishment of a uniform testing methodology. First, we begin with a clear definition of the problem and the ground truth collection process. Then we examine the need for pre-processing and parameter optimization specific to each technique. Next, we investigate the problem of establishing meaningful statistical measurements of the performance of these algorithms and the use of non-parametric comparison methods to perform pairwise comparisons of methods. Lastly, we look at the sensitivity of each algorithm to particular typographical features, which indicates the need for the adoption of a stratified sampling paradigm for accurate analysis of performance. !14
展开▼