In order to realize Web information retrieval using characteristic tree structured patterns in semistructured Web documents, methods for discovering frequent patterns or common characteristics in semistructured documents become more and more important. We have studied methods for discovering maximally frequent tree structured patterns in semistructured Web documents. A tag tree pattern is an edge labeled tree with ordered children and structured variables. An edge label of a tag tree pattern is a tag or a keyword in Web documents, or a wildcard for any string. Each variable, which matches any subtree, represents a field of a Web document. A tag tree pattern is much more powerful than a usual tree structured pattern. In order to represent tree structured patterns with rich structural features, we introduce a new kind of variables, called height-constrained variables. An
展开▼