A method called normalized template matching is described which detects newcontent in any given target web page with reference to a control web page. Thecontrol page is typically an older version of the given target web page. The controlpage is first divided into sections by using non-formatting HTML tags. Each section isthen individually normalized by removing formatting HTML tags, meta charactersand repetitive white spaces, after which they are inserted into a template in sortedorder. Each entry in the template consists of the section content and its type. Once theentire control page is processed, a normalized template is obtained. The target page issimilarly divided and normalized, after which each section of the target template ismatched against the entries in the control template. Any section of the target pagewhich does not match the control template will be flagged as new content andpresented in a summary page to the user.
展开▼