首页>
外国专利>
Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents
Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents
展开▼
机译:用于检测遗留文档中包括页眉和页脚的分页构造的方法和设备
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method for identifying header/footer content of a document, in order to sequence text fragments comprising recognizable text blocks as derived from the document. The textual variability of lines comprised of text blocks, including the different kinds of text blocks within the line is analyzed for assessment of textual variability. Header/footer zones are defined by textual content having a low textual variability. An alternative embodiment identifies pagination constructs by comparing selected text-boxes for similarity and proximity and clustering the text boxes satisfying a predetermined similarity value, wherein the clustered text boxes are deemed to comprise pagination constructs.
展开▼