We address the problem of clustering XML data according to semantically-enriched features extracted by analyzing content and structural specifics in the data. Content features are selected from the textual contents of XML elements, while structure features are extracted from XML tag paths on the basis of ontological knowledge. Moreover, we conceive a transactional model for representing sets of semantically cohesive XML structures, and exploit such a model to effectively and efficiently cluster XML data. The resulting clustering framework was successfully tested on some collections extracted from the DBLP XML archive.
展开▼