Abstract: We have discussed a morphologically based nonlinear document degradation model to characterize the perturbation process associated with the printing and scanning process $LB@KHP93, KHP94$RB@. In this paper we use the nonparametric estimation algorithm discussed in $LB@KHP93, KHP94$RB for estimating the sizes of the structuring elements of the degradation model. Other parameters of the model can be estimated in a similar fashion. Thus, given a small sample of (real) scanned documents, we can estimate the parameters of the model using the nonparametric estimation algorithm, and use the estimated parameters to create a large sample of simulated documents with degradation characteristics similar to that of the real scanned documents. The large simulated sample can then be used for various purposes, for example, training classifiers, estimating performance of OCR algorithms, choosing parameter values in noise removal algorithms, etc. !10
展开▼