Describes a syntactic approach to deducing the logical structureof printed documents from their physical layout. Page layout isdescribed by a two-dimensional grammar, similar to a context-free stringgrammar, and a chart parser is used to parse segmented page imagesaccording to the grammar. This process is part of a system which readsscanned document images and produces computer-readable text in a logicalmark-up format such as SGML. The system is briefly outlined, the grammarformalism and the parsing algorithm are described in detail, and someexperimental results are reported
展开▼