In this paper, we explore the utility of Local Binary Pattern (LBP) descriptors and variance measure towards the development of efficient techniques in order to segment a large collection of historical machine printed document pages. The result of segmentation will help us to organize the document pages in a structural format, which is useful in many applications like historical document access. In our experiments, three basic reference models namely background, text and image models are used to segment various non-text information together with the text. The method is tested on an archive of Portuguese historical documents and shows promising results.
展开▼