In this paper, we investigate the problem of clustering XML documents based on their structure. We represent the paths in an XML document as a multiset and use the symmetric difference operation on multisets to define certain metrics. These metrics are then used to obtain a measure of similarity between any two documents in a collection. Our technique was successfully applied to real and synthesized XML documents yielding high-quality clusterings.
展开▼