A method for processing an electronic document (ED) to infer titles in the ED is provided. The method includes: generating a mark-up version of the ED comprising text-styling attributes, text-layout attributes, and text content information of characters included in the ED; generating statistical information of the text-styling and text-layout attributes; calculating, for each text-styling and text-layout attribute, a relative weight score; calculating, for each paragraph in the ED: a styling criteria score and a layout criteria score based on the statistical information and the relative weight scores; a text content score based on the text content information; and a title confidence score based on the styling criteria score, the layout criteria score, and the text content score; and generating a metadata for the ED that includes the title confidence score for each paragraph for use in inferring the titles in the ED.
展开▼